← All models View on HuggingFace →
M
Meta: Llama 3.2 11B Vision Instruct
meta-llama/llama-3.2-11b-vision-instruct
VisionJSONStreaming
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
Pricing
Input
$0.245 / 1M
Output
$0.245 / 1M
Specs
Context
131,072 tokens
Input
text, image
Output
text
Knowledge cutoff: 2023-12-31
Released: 2024-09
Supported parameters
frequency_penaltylogit_biasmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstoptemperaturetop_ktop_p
Open weights · HuggingFace
234,184 downloads/mo
1,594 likes
llama3.2 image-text-to-text