← All models
B
ByteDance: UI-TARS 7B
bytedance/ui-tars-1.5-7b
VisionStreaming
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
Pricing
Input
$0.1 / 1M
Output
$0.2 / 1M
Specs
Context
128,000 tokens
Input
image, text
Output
text
Knowledge cutoff: 2025-01-31
Released: 2025-07
Supported parameters
frequency_penaltylogit_biasmax_tokenspresence_penaltyrepetition_penaltyseedstoptemperaturetop_ktop_p
Open weights · HuggingFace
442,785 downloads/mo
554 likes
apache-2.0 image-text-to-text arXiv:2501.12326arXiv:2404.07972arXiv:2409.08264arXiv:2401.13919arXiv:2504.01382arXiv:2405.14573arXiv:2410.23218arXiv:2504.07981
View on HuggingFace →