← All models
S
StepFun: Step 3.7 Flash
stepfun/step-3.7-flash
VisionTool useJSONReasoningStreaming
Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...
Pricing
Input
$0.2 / 1M
Output
$1.15 / 1M
Specs
Context
256,000 tokens
Input
text, image, video
Output
text
Released: 2026-05
Supported parameters
frequency_penaltyinclude_reasoninglogprobsmax_tokensreasoningresponse_formatstopstructured_outputstemperaturetoolstop_logprobstop_p