Z.ai: GLM 4.6V

z-ai/glm-4.6v

VisionTool useJSONReasoningStreaming

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts...

Pricing

Input

$0.3 / 1M

Output

$0.9 / 1M

Specs

Context

131,072 tokens

Input

image, text, video

Output

text

Released: 2025-12

Supported parameters

frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningrepetition_penaltyresponse_formatseedstoptemperaturetool_choicetoolstop_ktop_p

Open weights · HuggingFace

3,848 downloads/mo

392 likes

mit image-text-to-text

arXiv:2507.01006

View on HuggingFace →

Use this model →