← 模型广场
M

Meta: Llama 3.2 11B Vision Instruct

meta-llama/llama-3.2-11b-vision-instruct

图像理解JSON流式

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

价格

输入

$0.245 / 1M

输出

$0.245 / 1M

参数

上下文

131,072 tokens

输入模态

text, image

输出模态

text

知识截止:2023-12-31

发布:2024-09

支持参数

frequency_penaltylogit_biasmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstoptemperaturetop_ktop_p

开放权重 · HuggingFace

234,184 月下载
1,594 收藏
llama3.2 image-text-to-text
在 HuggingFace 查看 →