Orbitrage

Models

Hundreds of models across every major provider. Each entry shows where it shines and where it struggles — so the router's pick is never a black box.

xAI

Grok 4.3

New
xAI

1M context, vision, competitive pricing vs frontier.

Newer — less community benchmarking than GPT/Claude.

Qual72
Ctx1M
In/M$1.25
Out/M$2.50
Speed90t/s
xAI

Grok 4.20 Reasoning

New
xAI

Built-in chain-of-thought at near-flagship price.

Slower than non-reasoning variants.

Qual70
Ctx1M
In/M$1.25
Out/M$2.50
Speed85t/s
xAI

Grok 4.20 Multi-Agent

New
xAI

Optimised for tool-calling and multi-agent loops.

Niche use-case; general tasks better on standard variants.

Qual68
Ctx1M
In/M$1.25
Out/M$2.50
Speed88t/s
xAI

Grok 4.20

New
xAI

Fast general-purpose at grok-3 price bracket.

No reasoning trace; raw quality below frontier.

Qual66
Ctx1M
In/M$1.25
Out/M$2.50
Speed95t/s
OpenAI

GPT-5.5

New
OpenAI

Top multimodal generalist with native audio in/out.

Highest output price in its tier.

Qual60
Ctx922K
In/M$11.25
Out/M$45
Speed72t/s
xAI

Grok 4 Fast

xAI

Quick turnaround with vision support.

Context capped at 256k.

Qual60
Ctx256K
In/M$1.25
Out/M$2.50
Speed120t/s
xAI

Grok 3

xAI

Real-time X/Twitter context plus solid vision.

Pricier per token than grok-4.x series.

Qual58
Ctx131K
In/M$3
Out/M$15
Speed80t/s
Anthropic

Claude Opus 4.7

New
Anthropic

Frontier reasoning, agentic coding, 1M-token context.

Premium price; slower throughput than Sonnet/Haiku.

Qual57
Ctx1M
In/M$5
Out/M$25
Speed48t/s
OpenAI

GPT-5.4

New
OpenAI

Wide 1M+ context, strong all-rounder at half the cost.

Slower than Flash-class competitors.

Qual57
Ctx1.05M
In/M$5.63
Out/M$22.50
Speed80t/s
Google

Gemini 3.1 Pro Preview

New
Google

Best long-context multimodal, including video understanding.

Preview-stage; quotas and edge-case quirks.

Qual57
Ctx1M
In/M$4.50
Out/M$18
Speed116t/s
OpenAI

o1

OpenAI

Hard math, science, planning — deep reasoning.

Expensive and slow; not for chat.

Qual55
Ctx200K
In/M$15
Out/M$60
Speed30t/s
xAI

Grok Build 0.1

xAI

Budget preview channel; cheapest xAI model.

Early preview — may change without notice.

Qual55
Ctx256K
In/M$1
Out/M$2
Speed115t/s
Anthropic

Claude Sonnet 4.6

New
Anthropic

Balanced flagship — most use cases at a sane price.

Trails Opus on the hardest reasoning tasks.

Qual52
Ctx200K
In/M$3
Out/M$15
Speed65t/s
DeepSeek

DeepSeek V4 Pro

New
DeepSeek

State-of-the-art coding for the price.

Slow output; text-only.

Qual52
Ctx1M
In/M$0.14
Out/M$3.48
Speed40t/s
Zhipu AI

GLM-5.1

New
Zhipu AI

Open Chinese-English flagship with solid reasoning.

Limited multimodal support.

Qual52
Ctx200K
In/M
Out/M
Speed60t/s
OpenAI

o3-mini

OpenAI

Cost-efficient chain-of-thought reasoning.

No image input; thinks before answering.

Qual50
Ctx200K
In/M$1.10
Out/M$4.40
Speed85t/s
Meta

Llama 4 Maverick

New
Meta

Open-weights with 1M context and image support.

Slower hosted speeds than closed peers.

Qual50
Ctx1M
In/M$0.40
Out/M$1.60
Speed50t/s
Zhipu AI

GLM-5

New
Zhipu AI

Free open-weights with strong reasoning.

Trails 5.1 on benchmarks.

Qual50
Ctx200K
In/M
Out/M
Speed60t/s
Moonshot AI

Kimi K2.5

New
Moonshot AI

Long-context Chinese flagship with vision.

Less common in Western tooling.

Qual49
Ctx256K
In/M
Out/M
Speed70t/s
MiniMax

MiniMax M2.7

New
MiniMax

Open agentic reasoning model.

Text-only; smaller community.

Qual49
Ctx205K
In/M
Out/M
Speed90t/s
OpenAI

GPT-4o

OpenAI

Native voice + image, fast multimodal pipelines.

Outclassed on quality by the 5.x line.

Qual48
Ctx128K
In/M$2.50
Out/M$10
Speed90t/s
DeepSeek

DeepSeek V4 Flash

New
DeepSeek

Cheap, fast coding-tuned model.

Trails Pro on hard reasoning.

Qual47
Ctx1M
In/M$0.14
Out/M$0.28
Speed82t/s
Alibaba

Qwen3 235B A22B

New
Alibaba

Open MoE with near-frontier reasoning.

Heavy to host; slower output.

Qual47
Ctx128K
In/M$0.14
Out/M$0.56
Speed55t/s
Google

Gemini 3 Flash

New
Google

Highest throughput at this quality tier.

Reasoning ceiling vs the Pro variant.

Qual46
Ctx1M
In/M$1.13
Out/M$4.50
Speed161t/s
Google

Gemma 4 31B

New
Google

Open-weights flagship — full multimodal incl. video.

Hardware-hungry to serve at full quality.

Qual46
Ctx256K
In/M
Out/M
Speed90t/s
Google

Gemini 1.5 Pro

Google

Mature 1M context with full audio + video understanding.

Outclassed by the 3.x generation.

Qual45
Ctx1M
In/M$1.25
Out/M$5
Speed100t/s
Meta

Llama 4 Scout

New
Meta

Massive 10M token context — needle-in-haystack king.

Quality below Maverick on dense reasoning.

Qual45
Ctx10M
In/M$0.20
Out/M$0.60
Speed80t/s
Mistral

Mistral Large 3

New
Mistral

European flagship with strong vision.

Expensive vs comparable open peers.

Qual45
Ctx128K
In/M$2
Out/M$6
Speed80t/s
Google

Gemma 4 26B MoE

New
Google

MoE efficiency at near-31B quality.

All experts must fit in memory at load time.

Qual44
Ctx256K
In/M
Out/M
Speed120t/s
xAI

Grok 2

xAI

Solid multimodal with real-time web context.

Behind the frontier on raw reasoning.

Qual44
Ctx131K
In/M$2
Out/M$10
Speed75t/s
Alibaba

Qwen 3.5 Max

New
Alibaba

Strong open multilingual model with vision.

Moderate speed for its size.

Qual44
Ctx262K
In/M$0.14
Out/M$0.56
Speed100t/s
DeepSeek

DeepSeek R1

DeepSeek

Open reasoning model with visible chain-of-thought.

Short context; slow generation.

Qual43
Ctx64K
In/M$0.55
Out/M$2.19
Speed45t/s
Microsoft

Phi-4 Reasoning

New
Microsoft

Tiny model with surprising reasoning chops.

Only 16K context.

Qual43
Ctx16K
In/M$0.070
Out/M$0.14
Speed110t/s
Perplexity

Sonar Pro

Perplexity

Live web search baked into every reply.

Text-only; quality bound by search results.

Qual43
Ctx200K
In/M$3
Out/M$15
Speed90t/s
Google

Gemma 3 27B

Google

Solid open vision-text baseline.

No audio/video; older generation.

Qual42
Ctx128K
In/M
Out/M
Speed95t/s
xAI

Grok 3 Mini

xAI

Low-cost Grok variant; good for simple tasks.

Text-only; limited reasoning depth.

Qual42
Ctx131K
In/M$0.30
Out/M$0.50
Speed130t/s
Anthropic

Claude Haiku 4.5

Anthropic

Cheap, fast Claude with image understanding.

Limited deep reasoning vs Sonnet/Opus.

Qual40
Ctx200K
In/M$0.80
Out/M$4
Speed120t/s
Microsoft

Phi-4 Multimodal

New
Microsoft

Smallest fully multimodal model — runs on a laptop.

Tiny context; modest quality.

Qual40
Ctx16K
In/M$0.070
Out/M$0.14
Speed120t/s
Microsoft

Phi-4 14B

Microsoft

Cheap reasoning baseline for simple jobs.

Text-only; short context.

Qual40
Ctx16K
In/M$0.070
Out/M$0.14
Speed110t/s
Cohere

Command R+

Cohere

Tuned for retrieval / RAG and tool use.

Behind the frontier on raw IQ.

Qual40
Ctx128K
In/M$2.50
Out/M$10
Speed70t/s
OpenAI

GPT-4o mini

OpenAI

Cheapest OpenAI multimodal; great for high-volume tasks.

Quality dips on complex reasoning.

Qual38
Ctx128K
In/M$0.15
Out/M$0.60
Speed110t/s
Google

Gemini 1.5 Flash

Google

Cheap and very fast for simple multimodal tasks.

Lower output quality on hard prompts.

Qual38
Ctx1M
In/M$0.075
Out/M$0.30
Speed150t/s
Google

Gemma 4 E4B

New
Google

Edge-class speed with multimodal coverage.

Smaller model loses nuance on hard tasks.

Qual38
Ctx128K
In/M
Out/M
Speed200t/s
Google

Gemma 3 12B

Google

Lightweight, easy to self-host.

Lower quality vs the 27B sibling.

Qual38
Ctx128K
In/M
Out/M
Speed150t/s
Meta

Llama 3.3 70B

Meta

Mature, well-optimized open-weights workhorse.

Text-only; older generation.

Qual38
Ctx128K
In/M$0.23
Out/M$0.40
Speed85t/s
Mistral

Mistral Small 4

New
Mistral

Cheap multimodal worker for high-volume jobs.

Quality cap on hard prompts.

Qual38
Ctx128K
In/M$0.10
Out/M$0.30
Speed130t/s
Mistral

Codestral

Mistral

Code-specialist; tuned for IDE-grade completion.

Code-only — not a general assistant.

Qual36
Ctx256K
In/M$0.30
Out/M$0.90
Speed100t/s
xAI

Grok 2 Mini

xAI

Cheap, fast Grok variant.

Lower quality; text-only.

Qual36
Ctx131K
In/M$0.20
Out/M$0.50
Speed100t/s
Groq

Llama 3 70B (Groq)

Groq

Fastest hosted Llama via custom LPU silicon.

Tiny 8K context.

Qual35
Ctx8K
In/M$0.59
Out/M$0.79
Speed800t/s
Inception

Mercury 2

New
Inception

Diffusion LLM with extreme inference speed.

New architecture; less proven on hard tasks.

Qual33
Ctx128K
In/M$0.38
Out/M$1.50
Speed678t/s
Google

Gemma 4 E2B

New
Google

Ultralight; runs on a single consumer GPU.

Reasoning ceiling is low.

Qual32
Ctx128K
In/M
Out/M
Speed350t/s
Alibaba

Qwen 3.5 0.8B

Alibaba

Sub-1B model for edge devices.

Very limited capability.

Qual11
Ctx262K
In/M$0.020
Out/M$0.080
Speed200t/s
OpenAI

Sora 2

New
OpenAI

Cinema-grade video with synchronized audio.

Slow generation; premium tier only.

Generation modelrequest access →
Google

Veo 3.1

New
Google

Latest Google video model with audio understanding.

Limited access; quotas apply.

Generation modelrequest access →
Google

Veo 3

New
Google

Strong realism with native audio output.

Eclipsed by 3.1.

Generation modelrequest access →
Runway

Runway Gen-4.5

New
Runway

Best creative control — motion brushes, references.

Higher price per second.

Generation modelrequest access →
Luma AI

Ray 3.14

New
Luma AI

Fast iteration, photoreal output.

Shorter clip length.

Generation modelrequest access →
Kuaishou

Kling 3.0

New
Kuaishou

Long-form, realistic motion.

Slower generation.

Generation modelrequest access →
ByteDance

Seedance 2.0

New
ByteDance

Strong stylized motion with audio.

Less granular prompt control.

Generation modelrequest access →
MiniMax

Hailuo 2.3

MiniMax

Fluid camera moves; cheap.

Inconsistent character identity.

Generation modelrequest access →
Pika Labs

Pika 2.5

Pika Labs

Fun edit effects, fast turnaround.

Not photoreal.

Generation modelrequest access →
Alibaba

Wan 2.6

New
Alibaba

Open-weights video model.

Quality below state-of-the-art.

Generation modelrequest access →
Tencent

HunyuanVideo 1.5

New
Tencent

Open Tencent model; large-scale generation.

Heavy compute to run.

Generation modelrequest access →
Lightricks

LTX-2

New
Lightricks

Real-time open-weights video.

Lower fidelity than premium peers.

Generation modelrequest access →
Skywork AI

SkyReels V3

New
Skywork AI

Long, narrative-driven clips.

Niche use cases.

Generation modelrequest access →
Zhipu AI

CogVideoX 5B

Zhipu AI

Lightweight open video model.

Older generation; limited fidelity.

Generation modelrequest access →
Black Forest Labs

FLUX.2 Pro

New
Black Forest Labs

Photoreal SOTA with sharp text rendering.

Premium pricing; slower than 1 Pro.

Generation modelrequest access →
Black Forest Labs

FLUX.1 Pro

Black Forest Labs

Crisp realism, tight prompt adherence.

Eclipsed by 2 Pro.

Generation modelrequest access →
Midjourney

Midjourney v7

New
Midjourney

Best artistic style and composition.

Discord/web-only API surface.

Generation modelrequest access →
Ideogram

Ideogram 3.0

New
Ideogram

Reliable text rendering inside images.

Less photoreal than FLUX.

Generation modelrequest access →
OpenAI

GPT Image 1

New
OpenAI

Native ChatGPT integration; great instruction following.

Tighter content rules.

Generation modelrequest access →
Google

Imagen 4

New
Google

Photoreal Google quality with strong prompt fidelity.

Restrictive content filters.

Generation modelrequest access →
Adobe

Adobe Firefly 5

New
Adobe

Commercially safe; tight Adobe tooling integration.

Conservative outputs.

Generation modelrequest access →
Stability AI

Stable Diffusion 3.5

Stability AI

Open weights; fully customizable pipeline.

Quality below frontier closed models.

Generation modelrequest access →
Suno

Suno v5.5

New
Suno

Full songs with vocals and structure.

Limited fine-grained control.

Generation modelrequest access →
Udio

Udio

Udio

Studio-quality vocal generation.

Smaller catalog of styles.

Generation modelrequest access →
MiniMax

MiniMax Music 2.5

New
MiniMax

Strong instrumentals plus vocals.

English-leaning training data.

Generation modelrequest access →
Mureka

Mureka V8

New
Mureka

Multilingual song output.

Less mainstream tooling.

Generation modelrequest access →
ElevenLabs

ElevenLabs Music

New
ElevenLabs

Crisp production quality.

Newer; smaller style library.

Generation modelrequest access →
Stability AI

Stable Audio 2.5

Stability AI

Open audio diffusion model.

Short max output length.

Generation modelrequest access →
ElevenLabs

ElevenLabs v3

New
ElevenLabs

Most expressive TTS on the market.

Latency on long inputs.

Generation modelrequest access →
ElevenLabs

Eleven Turbo v2.5

ElevenLabs

Low-latency real-time TTS.

Less expressive than v3.

Generation modelrequest access →
Mistral

Voxtral TTS

New
Mistral

Open multilingual TTS.

Smaller voice library.

Generation modelrequest access →
Nari Labs

Dia

New
Nari Labs

Open expressive dialogue voices.

One voice per checkpoint.

Generation modelrequest access →
Open Source

Kokoro v1.0

Open Source

Tiny open TTS — runs anywhere.

Limited voice variety.

Generation modelrequest access →
Fish Audio

Fish Audio S2 Pro

New
Fish Audio

High-fidelity voice cloning.

Less English coverage than Eleven.

Generation modelrequest access →
P

PlayDialog

Play.ht

Natural multi-speaker dialogue.

Fewer base voices.

Generation modelrequest access →