Z.ai: GLM 4.5V

z-ai/glm-4.5v

VisionTool useJSONReasoningStreaming

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...

Pricing

Input

$0.6 / 1M

Output

$1.80 / 1M

Specs

Context

65,536 tokens

Input

text, image

Output

text

Knowledge cutoff: 2024-12-31

Released: 2025-08

Supported parameters

frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningrepetition_penaltyresponse_formatseedstoptemperaturetool_choicetoolstop_ktop_p

Open weights · HuggingFace

178,678 downloads/mo

718 likes

mit image-text-to-text

arXiv:2507.01006

View on HuggingFace →

Use this model →