Google: Gemma 3 4B

google/gemma-3-4b-it

VisionJSONStreaming

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Pricing

Input

$0.04 / 1M

Output

$0.08 / 1M

Specs

Context

131,072 tokens

Input

text, image

Output

text

Knowledge cutoff: 2024-08-31

Released: 2025-03

Supported parameters

frequency_penaltylogit_biasmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstopstructured_outputstemperaturetop_ktop_p

Open weights · HuggingFace

1,797,454 downloads/mo

1,347 likes

gemma image-text-to-text

View on HuggingFace →

Use this model →