← All models
B

ByteDance: UI-TARS 7B

bytedance/ui-tars-1.5-7b

VisionStreaming

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Pricing

Input

$0.1 / 1M

Output

$0.2 / 1M

Specs

Context

128,000 tokens

Input

image, text

Output

text

Knowledge cutoff: 2025-01-31

Released: 2025-07

Supported parameters

frequency_penaltylogit_biasmax_tokenspresence_penaltyrepetition_penaltyseedstoptemperaturetop_ktop_p