Qwen: Qwen3 VL 8B Thinking

qwen/qwen3-vl-8b-thinking

VisionTool useJSONReasoningStreaming

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...

Pricing

Input

$0.117 / 1M

Output

$1.36 / 1M

Specs

Context

256,000 tokens

Input

image, text

Output

text

Released: 2025-10

Supported parameters

include_reasoningmax_tokenspresence_penaltyreasoningresponse_formatseedstructured_outputstemperaturetool_choicetoolstop_p

Open weights · HuggingFace

461,127 downloads/mo

209 likes

apache-2.0 image-text-to-text

arXiv:2505.09388 arXiv:2502.13923 arXiv:2409.12191 arXiv:2308.12966

View on HuggingFace →

Use this model →