← All models
Google: Gemma 3 12B
google/gemma-3-12b-it
VisionTool useJSONStreaming
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Pricing
Input
$0.04 / 1M
Output
$0.13 / 1M
Specs
Context
131,072 tokens
Input
text, image
Output
text
Knowledge cutoff: 2024-08-31
Released: 2025-03
Supported parameters
frequency_penaltylogit_biasmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstopstructured_outputstemperaturetool_choicetoolstop_ktop_p
Open weights · HuggingFace
2,898,033 downloads/mo
726 likes
gemma image-text-to-text arXiv:1905.07830arXiv:1905.10044arXiv:1911.11641arXiv:1904.09728arXiv:1705.03551arXiv:1911.01547arXiv:1907.10641arXiv:1903.00161arXiv:2009.03300arXiv:2304.06364arXiv:2103.03874arXiv:2110.14168arXiv:2311.12022arXiv:2108.07732arXiv:2107.03374arXiv:2210.03057arXiv:2106.03193arXiv:1910.11856arXiv:2502.12404arXiv:2502.21228arXiv:2404.16816arXiv:2104.12756arXiv:2311.16502arXiv:2203.10244arXiv:2404.12390arXiv:1810.12440arXiv:1908.02660arXiv:2312.11805
View on HuggingFace →