← 模型广场

Qwen: Qwen3 VL 32B Instruct

qwen/qwen3-vl-32b-instruct

图像理解工具调用JSON流式

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

价格

输入

$0.104 / 1M

输出

$0.416 / 1M

参数

上下文

262,144 tokens

输入模态

text, image

输出模态

text

发布：2025-10

支持参数

max_tokenspresence_penaltyresponse_formatseedtemperaturetool_choicetoolstop_p

开放权重 · HuggingFace

1,803,347 月下载

202 收藏

apache-2.0 image-text-to-text

arXiv:2505.09388 arXiv:2502.13923 arXiv:2409.12191 arXiv:2308.12966

在 HuggingFace 查看 →

使用该模型 →