← All models View on HuggingFace →
Qwen: Qwen3 VL 8B Instruct
qwen/qwen3-vl-8b-instruct
VisionTool useJSONStreaming
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
Pricing
Input
$0.08 / 1M
Output
$0.5 / 1M
Specs
Context
256,000 tokens
Input
image, text
Output
text
Released: 2025-10
Supported parameters
frequency_penaltylogit_biasmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstopstructured_outputstemperaturetool_choicetoolstop_ktop_p
Open weights · HuggingFace
8,454,065 downloads/mo
922 likes
apache-2.0 image-text-to-text