Every model, one place.
GptGet AI model API gateway: one OpenAI-compatible key for GPT, Claude, Gemini, DeepSeek, Qwen and more.
stepfun/step-3.7-flash
Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model.
256,000 tokens
anthropic/claude-opus-4.8-fast
Fast-mode variant of [Opus 4.8](/anthropic/claude-opus-4.8) - identical capabilities wit
1,000,000 tokens
anthropic/claude-opus-4.8
Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family
1,000,000 tokens
qwen/qwen3.7-max
Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input an
1,000,000 tokens
x-ai/grok-build-0.1
Grok Build 0.1 is xAI’s fast coding model trained specifically for agentic software engi
256,000 tokens
google/gemini-3.5-flash
Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level c
1,048,576 tokens
anthropic/claude-opus-4.7-fast
Fast-mode variant of [Opus 4.7](/anthropic/claude-opus-4.7) - identical capabilities wit
1,000,000 tokens
perceptron/perceptron-mk1
Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for vide
32,768 tokens
inclusionai/ring-2.6-1t
Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for
262,144 tokens
google/gemini-3.1-flash-lite
Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-
1,048,576 tokens
openai/gpt-chat-latest
GPT Chat Latest points to OpenAI's stable API alias `chat-latest` that always resolves t
400,000 tokens
x-ai/grok-4.3
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text outpu
1,000,000 tokens
ibm-granite/granite-4.1-8b
Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, par
131,072 tokens
mistralai/mistral-medium-3-5
Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It suppo
262,144 tokens
openrouter/owl-alpha
Owl Alpha is a high-performance foundation model designed for agentic workloads. Nativel
1,048,756 tokens
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as
256,000 tokens
poolside/laguna-xs.2:free
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://
262,144 tokens
poolside/laguna-m.1:free
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), opti
262,144 tokens
qwen/qwen3.5-plus-20260420
Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It ac
1,000,000 tokens
qwen/qwen3.6-flash
Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It sup
1,000,000 tokens
qwen/qwen3.6-35b-a3b
Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion to
262,144 tokens
qwen/qwen3.6-max-preview
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse
262,144 tokens
qwen/qwen3.6-27b
Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba
262,144 tokens
openai/gpt-5.5-pro
GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy
1,050,000 tokens
openai/gpt-5.5
GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building
1,050,000 tokens
deepseek/deepseek-v4-pro
DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total
1,048,576 tokens
deepseek/deepseek-v4-flash
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with
1,048,576 tokens
inclusionai/ling-2.6-1t
Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-p
262,144 tokens
tencent/hy3-preview
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agen
262,144 tokens
xiaomi/mimo-v2.5-pro
MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agent
1,048,576 tokens
xiaomi/mimo-v2.5
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performan
1,048,576 tokens
openai/gpt-5.4-image-2
[GPT-5.4](https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model
272,000 tokens
inclusionai/ling-2.6-flash
Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameter
262,144 tokens
openrouter/pareto-code
The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artif
2,000,000 tokens
moonshotai/kimi-k2.6
Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon c
262,144 tokens
moonshotai/kimi-k2.6:free
Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon c
262,144 tokens
anthropic/claude-opus-4.7
Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asyn
1,000,000 tokens
anthropic/claude-opus-4.6-fast
Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities wit
1,000,000 tokens
z-ai/glm-5.1
GLM-5.1 delivers a major leap in coding capability, with particularly significant gains
202,752 tokens
google/gemma-4-26b-a4b-it
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google De
262,144 tokens
google/gemma-4-26b-a4b-it:free
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google De
262,144 tokens
google/gemma-4-31b-it
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text a
262,144 tokens
google/gemma-4-31b-it:free
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text a
262,144 tokens
qwen/qwen3.6-plus
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention w
1,000,000 tokens
z-ai/glm-5v-turbo
GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-
202,752 tokens
arcee-ai/trinity-large-thinking
Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee
262,144 tokens
x-ai/grok-4.20-multi-agent
Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-
2,000,000 tokens
x-ai/grok-4.20
Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic tool cal
2,000,000 tokens
google/lyria-3-pro-preview
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music gene
1,048,576 tokens
google/lyria-3-clip-preview
30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of mus
1,048,576 tokens
kwaipilot/kat-coder-pro-v2
KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, des
256,000 tokens
rekaai/reka-edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts ima
16,384 tokens
minimax/minimax-m2.7
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-wor
204,800 tokens
openai/gpt-5.4-nano
GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, o
400,000 tokens
openai/gpt-5.4-mini
GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model o
400,000 tokens
mistralai/mistral-small-2603
Mistral Small 4 is the next major release in the Mistral Small family, unifying the capa
262,144 tokens
z-ai/glm-5-turbo
GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance
202,752 tokens
nvidia/nemotron-3-super-120b-a12b
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B p
1,000,000 tokens
nvidia/nemotron-3-super-120b-a12b:free
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B p
1,000,000 tokens
bytedance-seed/seed-2.0-lite
Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong m
262,144 tokens
qwen/qwen3.5-9b
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver
262,144 tokens
openai/gpt-5.4-pro
GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture
1,050,000 tokens
openai/gpt-5.4
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a singl
1,050,000 tokens
inception/mercury-2
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLL
128,000 tokens
openai/gpt-5.3-chat
GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations
128,000 tokens
google/gemini-3.1-flash-lite-preview
Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volum
1,048,576 tokens
bytedance-seed/seed-2.0-mini
Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios,
262,144 tokens
qwen/qwen3.5-35b-a3b
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid arch
262,144 tokens
qwen/qwen3.5-27b
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mecha
262,144 tokens
qwen/qwen3.5-122b-a10b
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture tha
262,144 tokens
qwen/qwen3.5-flash-02-23
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that
1,000,000 tokens
liquid/lfm-2-24b-a2b
LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed fo
128,000 tokens
google/gemini-3.1-pro-preview-customtools
Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool se
1,048,756 tokens
openai/gpt-5.3-codex
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier sof
400,000 tokens
aion-labs/aion-2.0
Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytell
131,072 tokens
google/gemini-3.1-pro-preview
Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced softwar
1,048,576 tokens
anthropic/claude-sonnet-4.6
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance
1,000,000 tokens
qwen/qwen3.5-plus-02-15
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture
1,000,000 tokens
qwen/qwen3.5-397b-a17b
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architect
262,144 tokens
minimax/minimax-m2.5
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Traine
204,800 tokens
z-ai/glm-5
GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems des
202,752 tokens
qwen/qwen3-max-thinking
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for hig
262,144 tokens
anthropic/claude-opus-4.6
Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks.
1,000,000 tokens
qwen/qwen3-coder-next
Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and
262,144 tokens
openrouter/free
The simplest way to get free inference. openrouter/free is a router that selects free mo
200,000 tokens
stepfun/step-3.5-flash
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse
262,144 tokens
moonshotai/kimi-k2.5
Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual c
262,144 tokens
upstage/solar-pro-3
Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B tot
128,000 tokens
minimax/minimax-m2-her
MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, ch
65,536 tokens
writer/palmyra-x5
Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI ag
1,040,000 tokens
liquid/lfm-2.5-1.2b-thinking:free
LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic task
32,768 tokens
liquid/lfm-2.5-1.2b-instruct:free
LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fa
32,768 tokens
openai/gpt-audio
The gpt-audio model is OpenAI's first generally available audio model. The new snapshot
128,000 tokens
openai/gpt-audio-mini
A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for
128,000 tokens
z-ai/glm-4.7-flash
As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance a
202,752 tokens
openai/gpt-5.2-codex
GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering
400,000 tokens
bytedance-seed/seed-1.6-flash
Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, suppor
262,144 tokens
bytedance-seed/seed-1.6
Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates
262,144 tokens
minimax/minimax-m2.1
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for codin
204,800 tokens
z-ai/glm-4.7
GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced p
202,752 tokens
google/gemini-3-flash-preview
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic w
1,048,576 tokens
xiaomi/mimo-v2-flash
MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a M
262,144 tokens
nvidia/nemotron-3-nano-30b-a3b
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute effici
256,000 tokens
nvidia/nemotron-3-nano-30b-a3b:free
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute effici
256,000 tokens
openai/gpt-5.2-chat
GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized
128,000 tokens
openai/gpt-5.2-pro
GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic codi
400,000 tokens
openai/gpt-5.2
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agenti
400,000 tokens
mistralai/devstral-2512
Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic
262,144 tokens
relace/relace-search
The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a
256,000 tokens
z-ai/glm-4.6v
GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and
131,072 tokens
nex-agi/deepseek-v3.1-nex-n1
DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model
131,072 tokens
essentialai/rnj-1-instruct
Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and
32,768 tokens
openrouter/bodybuilder
Transform your natural language requests into structured OpenRouter API request objects.
128,000 tokens
openai/gpt-5.1-codex-max
GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, hi
400,000 tokens
amazon/nova-2-lite-v1
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can pr
1,000,000 tokens
mistralai/ministral-14b-2512
The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilitie
262,144 tokens
mistralai/ministral-8b-2512
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny
262,144 tokens
mistralai/ministral-3b-2512
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient ti
131,072 tokens
mistralai/mistral-large-2512
Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture
262,144 tokens
arcee-ai/trinity-mini
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model fea
131,072 tokens
deepseek/deepseek-v3.2
DeepSeek-V3.2 is a large language model designed to harmonize high computational efficie
131,072 tokens
prime-intellect/intellect-3
INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from
131,072 tokens
anthropic/claude-opus-4.5
Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software e
200,000 tokens
allenai/olmo-3-32b-think
Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep rea
65,536 tokens
deepcogito/cogito-v2.1-671b
Cogito v2.1 671B MoE represents one of the strongest open models globally, matching perf
128,000 tokens
openai/gpt-5.1
GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger genera
400,000 tokens
openai/gpt-5.1-chat
GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized f
128,000 tokens
openai/gpt-5.1-codex
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and
400,000 tokens
openai/gpt-5.1-codex-mini
GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex
400,000 tokens
moonshotai/kimi-k2-thinking
Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending
262,144 tokens
amazon/nova-premier-v1
Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reason
1,000,000 tokens
perplexity/sonar-pro-search
Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexi
200,000 tokens
openai/gpt-oss-safeguard-20b
gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. Th
131,072 tokens
nvidia/nemotron-nano-12b-v2-vl
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model desi
128,000 tokens
nvidia/nemotron-nano-12b-v2-vl:free
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model desi
128,000 tokens
minimax/minimax-m2
MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end c
204,800 tokens
qwen/qwen3-vl-32b-instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for hig
262,144 tokens
ibm-granite/granite-4.0-h-micro
Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models
131,000 tokens
microsoft/phi-4-mini-instruct
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered p
131,072 tokens
openai/gpt-5-image-mini
GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Min
400,000 tokens
anthropic/claude-haiku-4.5
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-fronti
200,000 tokens
qwen/qwen3-vl-8b-thinking
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal mo
256,000 tokens
qwen/qwen3-vl-8b-instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, bui
256,000 tokens
openai/gpt-5-image
[GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with sta
400,000 tokens
openai/o3-deep-research
o3-deep-research is OpenAI's advanced model for deep research, designed to tackle comple
200,000 tokens
openai/o4-mini-deep-research
o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for
200,000 tokens
nvidia/llama-3.3-nemotron-super-49b-v1.5
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat mod
131,072 tokens
qwen/qwen3-vl-30b-a3b-thinking
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with
131,072 tokens
qwen/qwen3-vl-30b-a3b-instruct
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with
262,144 tokens
openai/gpt-5-pro
GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, cod
400,000 tokens
z-ai/glm-4.6
Compared with GLM-4.5, this generation brings several key improvements: Longer context w
202,752 tokens
anthropic/claude-sonnet-4.5
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-
1,000,000 tokens
deepseek/deepseek-v3.2-exp
DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an int
163,840 tokens
thedrummer/cydonia-24b-v4.1
Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, p
131,072 tokens
relace/relace-apply-3
Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straigh
256,000 tokens
google/gemini-2.5-flash-lite-preview-09-2025
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimiz
1,048,576 tokens
qwen/qwen3-vl-235b-a22b-thinking
Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation wi
131,072 tokens
qwen/qwen3-vl-235b-a22b-instruct
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text
262,144 tokens
qwen/qwen3-max
Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements i
262,144 tokens
qwen/qwen3-coder-plus
Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A3
1,000,000 tokens
openai/gpt-5-codex
GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and cod
400,000 tokens
deepseek/deepseek-v3.1-terminus
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) tha
163,840 tokens
qwen/qwen3-coder-flash
Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen
1,000,000 tokens
qwen/qwen3-next-80b-a3b-thinking
Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that
262,144 tokens
qwen/qwen3-next-80b-a3b-instruct
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series
262,144 tokens
qwen/qwen3-next-80b-a3b-instruct:free
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series
262,144 tokens
qwen/qwen-plus-2025-07-28
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reaso
1,000,000 tokens
qwen/qwen-plus-2025-07-28:thinking
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reaso
1,000,000 tokens
nvidia/nemotron-nano-9b-v2
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDI
128,000 tokens
nvidia/nemotron-nano-9b-v2:free
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDI
128,000 tokens
moonshotai/kimi-k2-0905
Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a larg
262,144 tokens
qwen/qwen3-30b-a3b-thinking-2507
Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimi
131,072 tokens
nousresearch/hermes-4-70b
Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B
131,072 tokens
nousresearch/hermes-4-405b
Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by N
131,072 tokens
deepseek/deepseek-chat-v3.1
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that suppo
163,840 tokens
mistralai/mistral-medium-3.1
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performanc
131,072 tokens
baidu/ernie-4.5-vl-28b-a3b
A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with
131,072 tokens
z-ai/glm-4.5v
GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built
65,536 tokens
ai21/jamba-large-1.7
Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in g
256,000 tokens
openai/gpt-5-chat
GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversation
128,000 tokens
openai/gpt-5
GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code qu
400,000 tokens
openai/gpt-5-mini
GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning ta
400,000 tokens
openai/gpt-5-nano
GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for develo
400,000 tokens
openai/gpt-oss-120b
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model f
131,072 tokens
openai/gpt-oss-120b:free
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model f
131,072 tokens
openai/gpt-oss-20b
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.
131,072 tokens
openai/gpt-oss-20b:free
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.
131,072 tokens
anthropic/claude-opus-4.1
Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved p
200,000 tokens
mistralai/codestral-2508
Mistral's cutting-edge language model for coding released end of July 2025. Codestral sp
256,000 tokens
qwen/qwen3-coder-30b-a3b-instruct
Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 12
160,000 tokens
qwen/qwen3-30b-a3b-instruct-2507
Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from
262,144 tokens
z-ai/glm-4.5
GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applicati
131,072 tokens
z-ai/glm-4.5-air
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose
131,072 tokens
z-ai/glm-4.5-air:free
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose
131,072 tokens
qwen/qwen3-235b-a22b-thinking-2507
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE
262,144 tokens
z-ai/glm-4-32b
GLM 4 32B is a cost-effective foundation language model. It can efficiently perform comp
128,000 tokens
qwen/qwen3-coder
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model devel
1,048,576 tokens
qwen/qwen3-coder:free
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model devel
1,048,576 tokens
bytedance/ui-tars-1.5-7b
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments,
128,000 tokens
google/gemini-2.5-flash-lite
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimiz
1,048,576 tokens
qwen/qwen3-235b-a22b-2507
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts la
262,144 tokens
switchpoint/router
Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI
131,072 tokens
moonshotai/kimi-k2
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by M
131,072 tokens
cognitivecomputations/dolphin-mistral-24b-venice-edition:free
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-
32,768 tokens
tencent/hunyuan-a13b-instruct
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed
131,072 tokens
morph/morph-v3-large
Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% acc
262,144 tokens
morph/morph-v3-fast
Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rap
81,920 tokens
baidu/ernie-4.5-vl-424b-a47b
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE
131,072 tokens
mistralai/mistral-small-3.2-24b-instruct
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optim
128,000 tokens
minimax/minimax-m1
MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context a
1,000,000 tokens
google/gemini-2.5-flash
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for
1,048,576 tokens
google/gemini-2.5-pro
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, co
1,048,576 tokens
openai/o3-pro
The o-series of models are trained with reinforcement learning to think before they answ
200,000 tokens
google/gemini-2.5-pro-preview
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, co
1,048,576 tokens
deepseek/deepseek-r1-0528
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par
163,840 tokens
anthropic/claude-opus-4
Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bring
200,000 tokens
anthropic/claude-sonnet-4
Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7,
1,000,000 tokens
google/gemma-3n-e4b-it
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices,
32,768 tokens
mistralai/mistral-medium-3
Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliv
131,072 tokens
google/gemini-2.5-pro-preview-05-06
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, co
1,048,576 tokens
arcee-ai/spotlight
Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fi
131,072 tokens
arcee-ai/maestro-reasoning
Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwe
131,072 tokens
arcee-ai/virtuoso-large
Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tack
131,072 tokens
arcee-ai/coder-large
Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further tra
32,768 tokens
meta-llama/llama-guard-4-12b
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for con
163,840 tokens
qwen/qwen3-30b-a3b
Qwen3, the latest generation in the Qwen large language model series, features both dens
131,072 tokens
qwen/qwen3-8b
Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed
131,072 tokens
qwen/qwen3-14b
Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, design
131,702 tokens
qwen/qwen3-32b
Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimi
131,072 tokens
qwen/qwen3-235b-a22b
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, ac
131,072 tokens
openai/o4-mini-high
OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effor
200,000 tokens
openai/o3
o3 is a well-rounded and powerful model across domains. It sets a new standard for math,
200,000 tokens
openai/o4-mini
OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-ef
200,000 tokens
openai/gpt-4.1
GPT-4.1 is a flagship large language model optimized for advanced instruction following,
1,047,576 tokens
openai/gpt-4.1-mini
GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at subs
1,047,576 tokens
openai/gpt-4.1-nano
For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the
1,047,576 tokens
meta-llama/llama-4-maverick
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from M
1,048,576 tokens
meta-llama/llama-4-scout
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed
10,000,000 tokens
deepseek/deepseek-chat-v3-0324
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the
163,840 tokens
openai/o1-pro
The o1 series of models are trained with reinforcement learning to think before they ans
200,000 tokens
mistralai/mistral-small-3.1-24b-instruct
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuri
128,000 tokens
google/gemma-3-4b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It
131,072 tokens
google/gemma-3-12b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It
131,072 tokens
cohere/command-a
Command A is an open-weights 111B parameter model with a 256k context window focused on
256,000 tokens
openai/gpt-4o-mini-search-preview
GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It
128,000 tokens
openai/gpt-4o-search-preview
GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is tr
128,000 tokens
rekaai/reka-flash-3
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billio
65,536 tokens
google/gemma-3-27b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It
131,072 tokens
thedrummer/skyfall-36b-v2
Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned f
32,768 tokens
perplexity/sonar-reasoning-pro
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://d
128,000 tokens
perplexity/sonar-pro
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://d
200,000 tokens
perplexity/sonar-deep-research
Sonar Deep Research is a research-focused model designed for multi-step retrieval, synth
128,000 tokens
mistralai/mistral-saba
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East
32,768 tokens
meta-llama/llama-guard-3-8b
Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classifi
131,072 tokens
openai/o3-mini-high
OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effor
200,000 tokens
aion-labs/aion-1.0
Aion-1.0 is a multi-model system designed for high performance across various tasks, inc
131,072 tokens
aion-labs/aion-1.0-mini
Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, desig
131,072 tokens
aion-labs/aion-rp-llama-3.1-8b
Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBenc
32,768 tokens
qwen/qwen2.5-vl-72b-instruct
Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and
131,072 tokens
qwen/qwen-plus
Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balance
1,000,000 tokens
openai/o3-mini
OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, pa
200,000 tokens
mistralai/mistral-small-24b-instruct-2501
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance
32,768 tokens
deepseek/deepseek-r1-distill-qwen-32b
DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B]
128,000 tokens
perplexity/sonar
Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and
127,072 tokens
deepseek/deepseek-r1-distill-llama-70b
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70
131,072 tokens
deepseek/deepseek-r1
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced a
163,840 tokens
minimax/minimax-01
MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image
1,000,192 tokens
microsoft/phi-4
[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning
16,384 tokens
sao10k/l3.1-70b-hanami-x1
This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).
16,000 tokens
deepseek/deepseek-chat
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction fo
131,072 tokens
sao10k/l3.3-euryale-70b
Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com
131,072 tokens
openai/o1
The latest and strongest model family from OpenAI, o1 is designed to spend more time thi
200,000 tokens
cohere/command-r7b-12-2024
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in Dece
128,000 tokens
meta-llama/llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instructi
131,072 tokens
meta-llama/llama-3.3-70b-instruct:free
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instructi
131,072 tokens
amazon/nova-lite-v1
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fas
300,000 tokens
amazon/nova-micro-v1
Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in
128,000 tokens
amazon/nova-pro-v1
Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a com
300,000 tokens
openai/gpt-4o-2024-11-20
The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more
128,000 tokens
mistralai/mistral-large-2407
This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's
131,072 tokens
qwen/qwen-2.5-coder-32b-instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly
128,000 tokens
thedrummer/unslopnemo-12b
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adven
32,768 tokens
anthropic/claude-3.5-haiku
Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and to
200,000 tokens
anthracite-org/magnum-v4-72b
This is a series of models designed to replicate the prose quality of the Claude 3 model
32,768 tokens
qwen/qwen-2.5-7b-instruct
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the follow
131,072 tokens
inflection/inflection-3-pi
Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, em
8,000 tokens
inflection/inflection-3-productivity
Inflection 3 Productivity is optimized for following instructions. It is better for task
8,000 tokens
thedrummer/rocinante-12b
Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have r
32,768 tokens
meta-llama/llama-3.2-11b-vision-instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handl
131,072 tokens
meta-llama/llama-3.2-1b-instruct
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing n
131,072 tokens
meta-llama/llama-3.2-3b-instruct
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for a
131,072 tokens
meta-llama/llama-3.2-3b-instruct:free
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for a
131,072 tokens
qwen/qwen-2.5-72b-instruct
Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the follo
131,072 tokens
cohere/command-r-08-2024
command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improve
128,000 tokens
cohere/command-r-plus-08-2024
command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) w
128,000 tokens
sao10k/l3.1-euryale-70b
Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-f
131,072 tokens
nousresearch/hermes-3-llama-3.1-70b
Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/n
131,072 tokens
nousresearch/hermes-3-llama-3.1-405b
Hermes 3 is a generalist language model with many improvements over Hermes 2, including
131,072 tokens
nousresearch/hermes-3-llama-3.1-405b:free
Hermes 3 is a generalist language model with many improvements over Hermes 2, including
131,072 tokens
sao10k/l3-lunaris-8b
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a stra
8,192 tokens
openai/gpt-4o-2024-08-06
The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with
128,000 tokens
meta-llama/llama-3.1-70b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Thi
131,072 tokens
meta-llama/llama-3.1-8b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Thi
131,072 tokens
mistralai/mistral-nemo
A 12B parameter model with a 128k token context length built by Mistral in collaboration
131,072 tokens
openai/gpt-4o-mini
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporti
128,000 tokens
openai/gpt-4o-mini-2024-07-18
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporti
128,000 tokens
google/gemma-2-27b-it
Gemma 2 27B by Google is an open model built from the same research and technology used
8,192 tokens
openai/gpt-4o
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inpu
128,000 tokens
openai/gpt-4o-2024-05-13
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inpu
128,000 tokens
meta-llama/llama-3-70b-instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This
8,192 tokens
meta-llama/llama-3-8b-instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This
8,192 tokens
mistralai/mixtral-8x22b-instruct
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixt
65,536 tokens
microsoft/wizardlm-2-8x22b
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly co
65,536 tokens
openai/gpt-4-turbo
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON
128,000 tokens
anthropic/claude-3-haiku
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsive
200,000 tokens
mistralai/mistral-large
This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It'
128,000 tokens
openai/gpt-3.5-turbo-0613
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language
4,095 tokens
openai/gpt-4-turbo-preview
The preview GPT-4 model with improved instruction following, JSON mode, reproducible out
128,000 tokens
openai/gpt-4-1106-preview
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON
128,000 tokens
openai/gpt-3.5-turbo-instruct
This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting ch
4,095 tokens
openai/gpt-3.5-turbo-16k
This model offers four times the context length of gpt-3.5-turbo, allowing it to support
16,385 tokens
mancer/weaver
An attempt to recreate Claude-style verbosity, but don't expect the same level of cohere
8,000 tokens
undi95/remm-slerp-l2-13b
A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
6,144 tokens
gryphe/mythomax-l2-13b
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich desc
4,096 tokens
openai/gpt-3.5-turbo
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language
16,385 tokens
openai/gpt-4
OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of sol
8,191 tokens
alfredpros/codellama-7b-instruct-solidity
4,096 tokens
arcee-ai/trinity-large-thinking:free
4,096 tokens
baidu/cobuddy:free
4,096 tokens
baidu/ernie-4.5-21b-a3b
131,072 tokens
baidu/ernie-4.5-21b-a3b-thinking
131,072 tokens
baidu/ernie-4.5-300b-a47b
131,072 tokens
baidu/qianfan-ocr-fast:free
4,096 tokens
chatgpt-4o-latest
4,096 tokens
claude-3-5-haiku
4,096 tokens
claude-3-5-haiku-20241022
4,096 tokens
claude-3-5-sonnet
4,096 tokens
claude-3-5-sonnet-20240620
4,096 tokens
claude-3-7-sonnet-20250219
4,096 tokens
claude-3-7-sonnet-20250219-thinking
4,096 tokens
claude-3-sonnet-20240229
4,096 tokens
claude-3.7-sonnet-thinking
4,096 tokens
claude-4-opus-thinking
4,096 tokens
claude-4-sonnet-thinking
4,096 tokens
claude-haiku-4-5-20251001
4,096 tokens
claude-opus-4-1-20250805
4,096 tokens
claude-opus-4-1-20250805-thinking
4,096 tokens
claude-opus-4-20250514-thinking
4,096 tokens
claude-opus-4-5
4,096 tokens
claude-opus-4-5-20251101
4,096 tokens
claude-opus-4-5-20251101-thinking
4,096 tokens
claude-opus-4-6
4,096 tokens
claude-opus-4-7
4,096 tokens
claude-opus-4-7-max
4,096 tokens
claude-sonnet-4-20250514-thinking
4,096 tokens
claude-sonnet-4-5-20250929
4,096 tokens
claude-sonnet-4-5-20250929-thinking
4,096 tokens
claude-sonnet-4-6
4,096 tokens
deepseek-ai/DeepSeek-V3
4,096 tokens
deepseek-ai/DeepSeek-V3-0324
4,096 tokens
deepseek-ai/DeepSeek-V3.1
4,096 tokens
deepseek-ai/DeepSeek-V3.2-Exp
163,840 tokens
deepseek-r1-searching
4,096 tokens
deepseek-v3-1-250821
4,096 tokens
deepseek-v3-1-terminus
4,096 tokens
deepseek-v3.1-0821
4,096 tokens
deepseek-v3.1-think
4,096 tokens
deepseek-v3.2-think
4,096 tokens
deepseek/deepseek-v4-flash:free
1,048,576 tokens
Dolphin3.0-R1-Mistral-24B
4,096 tokens
doubao-seedance-1-0-pro-250528
4,096 tokens
doubao-seedance-1-0-pro-fast-251015
4,096 tokens
doubao-seedance-1-5-pro
4,096 tokens
doubao-seedance-1-5-pro-251215
4,096 tokens
doubao-seedream-3-0-t2i-250415
4,096 tokens
doubao-seedream-4-0
4,096 tokens
doubao-seedream-4-0-250828
4,096 tokens
doubao-seedream-4-0-4k
4,096 tokens
doubao-seedream-4-5
4,096 tokens
doubao-seedream-4-5-251128
4,096 tokens
doubao-seedream-4-5-4k
4,096 tokens
doubao-seedream-5-0
4,096 tokens
doubao-seedream-5-0-260128
4,096 tokens
doubao-seedream-5-0-4k
4,096 tokens
gemini-1.5-flash
4,096 tokens
gemini-1.5-flash-002
4,096 tokens
gemini-1.5-flash-exp-0827
4,096 tokens
gemini-2.0-flash
4,096 tokens
gemini-2.0-flash-exp
4,096 tokens
gemini-2.0-flash-lite
4,096 tokens
gemini-2.0-flash-lite-preview
4,096 tokens
gemini-2.0-flash-thinking-exp-1219
4,096 tokens
gemini-2.5-flash-image
32,768 tokens
gemini-2.5-flash-preview-04-17
4,096 tokens
gemini-2.5-flash-preview-09-2025
4,096 tokens
gemini-2.5-pro-ci
4,096 tokens
gemini-2.5-pro-preview-03-25
4,096 tokens
gemini-2.5-pro-preview-06-05
4,096 tokens
gemini-3-1-pro-high
4,096 tokens
gemini-3-fast
4,096 tokens
gemini-3-fast-all
4,096 tokens
gemini-3-fast-deepsearch
4,096 tokens
gemini-3-flash-all
4,096 tokens
gemini-3-pro
4,096 tokens
gemini-3-pro-canvas
4,096 tokens
gemini-3-pro-ci
4,096 tokens
gemini-3-pro-deepsearch
4,096 tokens
gemini-3-pro-high-ci
4,096 tokens
gemini-3-pro-latest
4,096 tokens
gemini-3-thinking
4,096 tokens
gemini-3.1-fast
4,096 tokens
gemini-3.1-pro
4,096 tokens
gemini-3.1-pro-ci
4,096 tokens
gemini-3.1-thinking
4,096 tokens
gemini-3.5-flash-ci
4,096 tokens
gemini-3.5-thinking
4,096 tokens
glm-4
4,096 tokens
glm-4-airx
4,096 tokens
glm-4-flash
4,096 tokens
glm-4-long
4,096 tokens
glm-4.5-x
4,096 tokens
google/gemini-2.0-flash-001
1,048,576 tokens
google/gemini-2.0-flash-lite-001
1,048,576 tokens
google/gemini-3-pro-image
4,096 tokens
gpt-3.5-turbo-0125
4,096 tokens
gpt-3.5-turbo-0301
4,096 tokens
gpt-3.5-turbo-1106
4,096 tokens
gpt-3.5-turbo-16k-0613
4,096 tokens
gpt-4-0125-preview
4,096 tokens
gpt-4-0613
4,096 tokens
gpt-4-vision-preview
4,096 tokens
gpt-5-codex-mini
4,096 tokens
grok-3
4,096 tokens
grok-3-all
4,096 tokens
grok-3-ci
4,096 tokens
grok-3-deepersearch
4,096 tokens
grok-3-deepsearch
4,096 tokens
grok-3-image
4,096 tokens
grok-3-reasoning
4,096 tokens
grok-3-search
4,096 tokens
grok-4
4,096 tokens
grok-4-0709
4,096 tokens
grok-4-1-thinking-1129
4,096 tokens
grok-4-auto
4,096 tokens
grok-4-ci
4,096 tokens
grok-4-fast-ci
4,096 tokens
grok-4-image
4,096 tokens
grok-4-mini-thinking-tahoe
4,096 tokens
grok-4.1
4,096 tokens
grok-4.1-ci
4,096 tokens
grok-4.1-fast
4,096 tokens
grok-4.1-fast-ci
4,096 tokens
grok-4.1-image
4,096 tokens
grok-4.1-thinking
4,096 tokens
grok-4.2
4,096 tokens
grok-4.2-ci
4,096 tokens
grok-4.2-fast
4,096 tokens
grok-4.2-fast-ci
4,096 tokens
grok-4.2-image
4,096 tokens
grok-4.3-ci
4,096 tokens
grok-420-agents
4,096 tokens
grok-420-fast
4,096 tokens
grok-420-thinking
4,096 tokens
grok-code-fast-1
4,096 tokens
inclusionai/ring-2.6-1t:free
4,096 tokens
japanese-stable-diffusion-xl
4,096 tokens
kimi-k2-0711-preview
4,096 tokens
kimi-k2-250905
4,096 tokens
kimi-k2-250905-ci
4,096 tokens
kimi-k2-instruct-0905
4,096 tokens
llama-2-13b
4,096 tokens
llama-2-70b
4,096 tokens
llama-3-sonar-large-32k-chat
4,096 tokens
llama-3-sonar-small-32k-chat
4,096 tokens
Llama-3.1-405B
4,096 tokens
llama-3.1-405b-instruct
4,096 tokens
Meta-Llama-3-3-70B-Instruct
4,096 tokens
meta-llama/llama-3.1-405b-instruct:free
4,096 tokens
meta-llama/llama-3.2-90b-vision-instruct
4,096 tokens
meta-llama/llama-3.2-90b-vision-instruct:free
4,096 tokens
meta-llama/Meta-Llama-3.1-405B-Instruct
4,096 tokens
microsoft/phi-3-medium-128k-instruct
4,096 tokens
microsoft/phi-3-medium-128k-instruct:free
4,096 tokens
minimax/minimax-m2.5:free
204,800 tokens
mistral-small-2407
4,096 tokens
mistral-small-latest
4,096 tokens
mistralai/devstral-medium
131,072 tokens
mistralai/devstral-small
131,072 tokens
mistralai/mistral-7b-instruct-v0.1
4,096 tokens
mistralai/mistral-large-2411
131,072 tokens
mistralai/pixtral-large-2411
131,072 tokens
mixtral-8x22b-instruct-v0.1
4,096 tokens
moonshot/kimi-k2.5
4,096 tokens
nousresearch/hermes-2-pro-llama-3-8b
8,192 tokens
o1-mini
4,096 tokens
o3-mini-2025-01-31-high
4,096 tokens
o3-mini-all
4,096 tokens
qvq-72b-preview-0310
4,096 tokens
qwen-3.5-plus
4,096 tokens
qwen-3.5-plus-search
4,096 tokens
qwen-3.5-plus-think
4,096 tokens
qwen-image-max
4,096 tokens
qwen-image-plus
4,096 tokens
qwen-max
4,096 tokens
qwen-max-search
4,096 tokens
qwen-plus-2025-09-11-think
4,096 tokens
qwen-plus-search
4,096 tokens
Qwen-QwQ-32B
4,096 tokens
qwen-turbo
4,096 tokens
qwen-vl-max
4,096 tokens
Qwen/Qwen1.5-110B-Chat
4,096 tokens
Qwen/Qwen2.5-72B-Instruct
4,096 tokens
Qwen/Qwen2.5-Coder-32B-Instruct
4,096 tokens
qwen/qwen3-max-2026
4,096 tokens
qwen/qwq-32b
4,096 tokens
qwen/qwq-72b-preview
4,096 tokens
qwen2.5-32b-instruct
4,096 tokens
qwen3-235b-a22b-search
4,096 tokens
qwen3-235b-a22b-thinking-2507-search
4,096 tokens
qwen3-30b-a3b-instruct-2507-search
4,096 tokens
qwen3-30b-a3b-think
4,096 tokens
qwen3-32b-think
4,096 tokens
qwen3-coder-480b-a35b-instruct
4,096 tokens
qwen3-coder-480b-a35b-instruct-search
4,096 tokens
qwen3-max-2025-10-30-think
4,096 tokens
qwen3-max-preview
4,096 tokens
qwen3-max-think
4,096 tokens
qwen3-vl-plus
4,096 tokens
qwen3-vl-plus-think
4,096 tokens
qwen3.6-plus-preview
4,096 tokens
qwq-32b-search
4,096 tokens
qwq-plus-latest
4,096 tokens
qwq-plus-latest-thinking
4,096 tokens
sao10k/l3-euryale-70b
8,192 tokens
sora_image
4,096 tokens
suno_lyrics
4,096 tokens
suno_music
4,096 tokens
test
4,096 tokens
wan2.6-5s
4,096 tokens
wan2.6-video-5s
4,096 tokens
x-ai/grok-4-fast
4,096 tokens
xiaomi/mimo-v2-omni
262,144 tokens
xiaomi/mimo-v2-pro
1,048,576 tokens
z-image-turbo
4,096 tokens
zai-org/glm-4.5
131,072 tokens
zai-org/glm-4.5-air
131,072 tokens
alibaba/wan-2.6
4,096 tokens
alibaba/wan-2.7
4,096 tokens
baai/bge-base-en-v1.5
8,192 tokens
baai/bge-large-en-v1.5
8,192 tokens
baai/bge-m3
8,192 tokens
black-forest-labs/flux.2-flex
67,344 tokens
black-forest-labs/flux.2-klein-4b
40,960 tokens
black-forest-labs/flux.2-max
46,864 tokens
black-forest-labs/flux.2-pro
46,864 tokens
bytedance-seed/seedream-4.5
4,096 tokens
bytedance/seedance-1-5-pro
4,096 tokens
bytedance/seedance-2.0
4,096 tokens
bytedance/seedance-2.0-fast
4,096 tokens
canopylabs/orpheus-3b-0.1-ft
4,096 tokens
cohere/rerank-4-fast
32,768 tokens
cohere/rerank-4-pro
32,768 tokens
cohere/rerank-v3.5
4,096 tokens
google/chirp-3
4,096 tokens
google/gemini-3.1-flash-tts-preview
8,192 tokens
google/gemini-embedding-001
20,000 tokens
google/gemini-embedding-2
8,192 tokens
google/gemini-embedding-2-preview
8,192 tokens
google/veo-3.1
4,096 tokens
google/veo-3.1-fast
4,096 tokens
google/veo-3.1-lite
4,096 tokens
hexgrad/kokoro-82m
4,096 tokens
intfloat/e5-base-v2
8,192 tokens
intfloat/e5-large-v2
8,192 tokens
intfloat/multilingual-e5-large
8,192 tokens
kwaivgi/kling-v3.0-pro
4,096 tokens
kwaivgi/kling-v3.0-std
4,096 tokens
kwaivgi/kling-video-o1
4,096 tokens
minimax/hailuo-2.3
4,096 tokens
mistralai/codestral-embed-2505
8,192 tokens
mistralai/mistral-embed-2312
8,192 tokens
mistralai/voxtral-mini-transcribe
4,096 tokens
mistralai/voxtral-mini-tts-2603
4,096 tokens
nvidia/llama-nemotron-embed-vl-1b-v2
131,072 tokens
nvidia/parakeet-tdt-0.6b-v3
4,096 tokens
openai/gpt-4o-mini-transcribe
4,096 tokens
openai/gpt-4o-mini-tts-2025-12-15
4,096 tokens
openai/gpt-4o-transcribe
4,096 tokens
openai/sora-2-pro
4,096 tokens
openai/text-embedding-3-large
8,192 tokens
openai/text-embedding-3-small
8,192 tokens
openai/text-embedding-ada-002
8,192 tokens
openai/whisper-1
4,096 tokens
openai/whisper-large-v3-turbo
4,096 tokens
perplexity/pplx-embed-v1-0.6b
32,000 tokens
perplexity/pplx-embed-v1-4b
32,000 tokens
qwen/qwen3-asr-flash-2026-02-10
4,096 tokens
qwen/qwen3-embedding-4b
32,768 tokens
qwen/qwen3-embedding-8b
32,000 tokens
recraft/recraft-v3
65,536 tokens
recraft/recraft-v4
65,536 tokens
recraft/recraft-v4-pro
65,536 tokens
recraft/recraft-v4-pro-vector
65,536 tokens
recraft/recraft-v4-vector
65,536 tokens
recraft/recraft-v4.1
65,536 tokens
recraft/recraft-v4.1-pro
65,536 tokens
recraft/recraft-v4.1-pro-vector
65,536 tokens
recraft/recraft-v4.1-utility
65,536 tokens
recraft/recraft-v4.1-utility-pro
65,536 tokens
recraft/recraft-v4.1-vector
65,536 tokens
sentence-transformers/all-minilm-l12-v2
8,192 tokens
sentence-transformers/all-minilm-l6-v2
8,192 tokens
sentence-transformers/all-mpnet-base-v2
8,192 tokens
sentence-transformers/multi-qa-mpnet-base-dot-v1
8,192 tokens
sentence-transformers/paraphrase-minilm-l6-v2
8,192 tokens
sesame/csm-1b
4,096 tokens
sourceful/riverflow-v2-fast
8,192 tokens
sourceful/riverflow-v2-fast-preview
8,192 tokens
sourceful/riverflow-v2-max-preview
8,192 tokens
sourceful/riverflow-v2-pro
8,192 tokens
sourceful/riverflow-v2-standard-preview
8,192 tokens
thenlper/gte-base
8,192 tokens
thenlper/gte-large
8,192 tokens
x-ai/grok-imagine-image-quality
65,536 tokens
x-ai/grok-imagine-video
4,096 tokens
x-ai/grok-voice-tts-1.0
15,000 tokens
zyphra/zonos-v0.1-hybrid
4,096 tokens
zyphra/zonos-v0.1-transformer
4,096 tokens
openai/gpt-realtime
4,096 tokens
openai/gpt-realtime-mini
4,096 tokens
openbmb/MiniCPM-o-4_5
4,096 tokens
kimi-k2.5
262,144 tokens
openai/whisper-large-v3
4,096 tokens
openai/gpt-4o-mini-realtime-preview
4,096 tokens
openai/gpt-4o-realtime-preview
4,096 tokens