Meta: Llama 3.2 11B Vision Instruct

meta-llama/llama-3.2-11b-vision-instruct

VisionJSONStreaming

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Pricing

Input

$0.245 / 1M

Output

$0.245 / 1M

Specs

Context

131,072 tokens

Input

text, image

Output

text

Knowledge cutoff: 2023-12-31

Released: 2024-09

Supported parameters

frequency_penaltylogit_biasmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstoptemperaturetop_ktop_p

Open weights · HuggingFace

234,184 downloads/mo

1,594 likes

llama3.2 image-text-to-text

arXiv:2204.05149

View on HuggingFace →

Use this model →