Meta: Llama 3.2 11B Vision Instruct

meta-llama/llama-3.2-11b-vision-instruct

图像理解JSON流式

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

价格

输入

$0.245 / 1M

输出

$0.245 / 1M

参数

上下文

131,072 tokens

输入模态

text, image

输出模态

text

知识截止：2023-12-31

发布：2024-09

支持参数

frequency_penaltylogit_biasmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstoptemperaturetop_ktop_p

开放权重 · HuggingFace

234,184 月下载

1,594 收藏

llama3.2 image-text-to-text

arXiv:2204.05149

在 HuggingFace 查看 →

使用该模型 →