StepFun: Step 3.7 Flash

stepfun/step-3.7-flash

VisionTool useJSONReasoningStreaming

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...

Pricing

Input

$0.2 / 1M

Output

$1.15 / 1M

Specs

Context

256,000 tokens

Input

text, image, video

Output

text

Released: 2026-05

Supported parameters

frequency_penaltyinclude_reasoninglogprobsmax_tokensreasoningresponse_formatstopstructured_outputstemperaturetoolstop_logprobstop_p

Use this model →