Models — Orbitrage

Grok 4.3

New

xAI

1M context, vision, competitive pricing vs frontier.

Newer — less community benchmarking than GPT/Claude.

Qual72

Ctx1M

In/M$1.25

Out/M$2.50

Speed90t/s

Grok 4.20 Reasoning

New

xAI

Built-in chain-of-thought at near-flagship price.

Slower than non-reasoning variants.

Qual70

Ctx1M

In/M$1.25

Out/M$2.50

Speed85t/s

Grok 4.20 Multi-Agent

New

xAI

Optimised for tool-calling and multi-agent loops.

Niche use-case; general tasks better on standard variants.

Qual68

Ctx1M

In/M$1.25

Out/M$2.50

Speed88t/s

Grok 4.20

New

xAI

Fast general-purpose at grok-3 price bracket.

No reasoning trace; raw quality below frontier.

Qual66

Ctx1M

In/M$1.25

Out/M$2.50

Speed95t/s

GPT-5.5

New

OpenAI

Top multimodal generalist with native audio in/out.

Highest output price in its tier.

Qual60

Ctx922K

In/M$11.25

Out/M$45

Speed72t/s

Grok 4 Fast

xAI

Quick turnaround with vision support.

Context capped at 256k.

Qual60

Ctx256K

In/M$1.25

Out/M$2.50

Speed120t/s

Grok 3

xAI

Real-time X/Twitter context plus solid vision.

Pricier per token than grok-4.x series.

Qual58

Ctx131K

In/M$3

Out/M$15

Speed80t/s

Claude Opus 4.7

New

Anthropic

Frontier reasoning, agentic coding, 1M-token context.

Premium price; slower throughput than Sonnet/Haiku.

Qual57

Ctx1M

In/M$5

Out/M$25

Speed48t/s

GPT-5.4

New

OpenAI

Wide 1M+ context, strong all-rounder at half the cost.

Slower than Flash-class competitors.

Qual57

Ctx1.05M

In/M$5.63

Out/M$22.50

Speed80t/s

Gemini 3.1 Pro Preview

New

Google

Best long-context multimodal, including video understanding.

Preview-stage; quotas and edge-case quirks.

Qual57

Ctx1M

In/M$4.50

Out/M$18

Speed116t/s

o1

OpenAI

Hard math, science, planning — deep reasoning.

Expensive and slow; not for chat.

Qual55

Ctx200K

In/M$15

Out/M$60

Speed30t/s

Grok Build 0.1

xAI

Budget preview channel; cheapest xAI model.

Early preview — may change without notice.

Qual55

Ctx256K

In/M$1

Out/M$2

Speed115t/s

Claude Sonnet 4.6

New

Anthropic

Balanced flagship — most use cases at a sane price.

Trails Opus on the hardest reasoning tasks.

Qual52

Ctx200K

In/M$3

Out/M$15

Speed65t/s

DeepSeek V4 Pro

New

DeepSeek

State-of-the-art coding for the price.

Slow output; text-only.

Qual52

Ctx1M

In/M$0.14

Out/M$3.48

Speed40t/s

GLM-5.1

New

Zhipu AI

Open Chinese-English flagship with solid reasoning.

Limited multimodal support.

Qual52

Ctx200K

In/M—

Out/M—

Speed60t/s

o3-mini

OpenAI

Cost-efficient chain-of-thought reasoning.

No image input; thinks before answering.

Qual50

Ctx200K

In/M$1.10

Out/M$4.40

Speed85t/s

Llama 4 Maverick

New

GLM-5

New

Zhipu AI

Free open-weights with strong reasoning.

Trails 5.1 on benchmarks.

Qual50

Ctx200K

In/M—

Out/M—

Speed60t/s

Kimi K2.5

New

Moonshot AI

Long-context Chinese flagship with vision.

Less common in Western tooling.

Qual49

Ctx256K

In/M—

Out/M—

Speed70t/s

MiniMax M2.7

New

MiniMax

Open agentic reasoning model.

Text-only; smaller community.

Qual49

Ctx205K

In/M—

Out/M—

Speed90t/s

GPT-4o

OpenAI

Native voice + image, fast multimodal pipelines.

Outclassed on quality by the 5.x line.

Qual48

Ctx128K

In/M$2.50

Out/M$10

Speed90t/s

DeepSeek V4 Flash

New

DeepSeek

Cheap, fast coding-tuned model.

Trails Pro on hard reasoning.

Qual47

Ctx1M

In/M$0.14

Out/M$0.28

Speed82t/s

Qwen3 235B A22B

New

Alibaba

Open MoE with near-frontier reasoning.

Heavy to host; slower output.

Qual47

Ctx128K

In/M$0.14

Out/M$0.56

Speed55t/s

Gemini 3 Flash

New

Google

Highest throughput at this quality tier.

Reasoning ceiling vs the Pro variant.

Qual46

Ctx1M

In/M$1.13

Out/M$4.50

Speed161t/s

Gemma 4 31B

New

Google

Open-weights flagship — full multimodal incl. video.

Hardware-hungry to serve at full quality.

Qual46

Ctx256K

In/M—

Out/M—

Speed90t/s

Gemini 1.5 Pro

Google

Mature 1M context with full audio + video understanding.

Outclassed by the 3.x generation.

Qual45

Ctx1M

In/M$1.25

Out/M$5

Speed100t/s

Llama 4 Scout

New

Mistral Large 3

New

Mistral

European flagship with strong vision.

Expensive vs comparable open peers.

Qual45

Ctx128K

In/M$2

Out/M$6

Speed80t/s

Gemma 4 26B MoE

New

Google

MoE efficiency at near-31B quality.

All experts must fit in memory at load time.

Qual44

Ctx256K

In/M—

Out/M—

Speed120t/s

Grok 2

xAI

Solid multimodal with real-time web context.

Behind the frontier on raw reasoning.

Qual44

Ctx131K

In/M$2

Out/M$10

Speed75t/s

Qwen 3.5 Max

New

Alibaba

Strong open multilingual model with vision.

Moderate speed for its size.

Qual44

Ctx262K

In/M$0.14

Out/M$0.56

Speed100t/s

DeepSeek R1

DeepSeek

Open reasoning model with visible chain-of-thought.

Short context; slow generation.

Qual43

Ctx64K

In/M$0.55

Out/M$2.19

Speed45t/s

Phi-4 Reasoning

New

Microsoft

Tiny model with surprising reasoning chops.

Only 16K context.

Qual43

Ctx16K

In/M$0.070

Out/M$0.14

Speed110t/s

Sonar Pro

Perplexity

Live web search baked into every reply.

Text-only; quality bound by search results.

Qual43

Ctx200K

In/M$3

Out/M$15

Speed90t/s

Gemma 3 27B

Google

Solid open vision-text baseline.

No audio/video; older generation.

Qual42

Ctx128K

In/M—

Out/M—

Speed95t/s

Grok 3 Mini

xAI

Low-cost Grok variant; good for simple tasks.

Text-only; limited reasoning depth.

Qual42

Ctx131K

In/M$0.30

Out/M$0.50

Speed130t/s

Claude Haiku 4.5

Anthropic

Cheap, fast Claude with image understanding.

Limited deep reasoning vs Sonnet/Opus.

Qual40

Ctx200K

In/M$0.80

Out/M$4

Speed120t/s

Phi-4 Multimodal

New

Microsoft

Smallest fully multimodal model — runs on a laptop.

Tiny context; modest quality.

Qual40

Ctx16K

In/M$0.070

Out/M$0.14

Speed120t/s

Phi-4 14B

Microsoft

Cheap reasoning baseline for simple jobs.

Text-only; short context.

Qual40

Ctx16K

In/M$0.070

Out/M$0.14

Speed110t/s

Command R+

Cohere

Tuned for retrieval / RAG and tool use.

Behind the frontier on raw IQ.

Qual40

Ctx128K

In/M$2.50

Out/M$10

Speed70t/s

GPT-4o mini

OpenAI

Cheapest OpenAI multimodal; great for high-volume tasks.

Quality dips on complex reasoning.

Qual38

Ctx128K

In/M$0.15

Out/M$0.60

Speed110t/s

Gemini 1.5 Flash

Google

Cheap and very fast for simple multimodal tasks.

Lower output quality on hard prompts.

Qual38

Ctx1M

In/M$0.075

Out/M$0.30

Speed150t/s

Gemma 4 E4B

New

Google

Edge-class speed with multimodal coverage.

Smaller model loses nuance on hard tasks.

Qual38

Ctx128K

In/M—

Out/M—

Speed200t/s

Gemma 3 12B

Google

Lightweight, easy to self-host.

Lower quality vs the 27B sibling.

Qual38

Ctx128K

In/M—

Out/M—

Speed150t/s

Llama 3.3 70B

Mistral Small 4

New

Mistral

Cheap multimodal worker for high-volume jobs.

Quality cap on hard prompts.

Qual38

Ctx128K

In/M$0.10

Out/M$0.30

Speed130t/s

Codestral

Mistral

Code-specialist; tuned for IDE-grade completion.

Code-only — not a general assistant.

Qual36

Ctx256K

In/M$0.30

Out/M$0.90

Speed100t/s

Grok 2 Mini

xAI

Cheap, fast Grok variant.

Lower quality; text-only.

Qual36

Ctx131K

In/M$0.20

Out/M$0.50

Speed100t/s

Llama 3 70B (Groq)

Groq

Fastest hosted Llama via custom LPU silicon.

Tiny 8K context.

Qual35

Ctx8K

In/M$0.59

Out/M$0.79

Speed800t/s

Mercury 2

New

Inception

Diffusion LLM with extreme inference speed.

New architecture; less proven on hard tasks.

Qual33

Ctx128K

In/M$0.38

Out/M$1.50

Speed678t/s

Gemma 4 E2B

New

Google

Ultralight; runs on a single consumer GPU.

Reasoning ceiling is low.

Qual32

Ctx128K

In/M—

Out/M—

Speed350t/s

Qwen 3.5 0.8B

Alibaba

Sub-1B model for edge devices.

Very limited capability.

Qual11

Ctx262K

In/M$0.020

Out/M$0.080

Speed200t/s

Sora 2

New

OpenAI

Cinema-grade video with synchronized audio.

Slow generation; premium tier only.

Generation modelrequest access →

Veo 3.1

New

Google

Latest Google video model with audio understanding.

Limited access; quotas apply.

Generation modelrequest access →

Veo 3

New

Google

Strong realism with native audio output.

Eclipsed by 3.1.

Generation modelrequest access →

Runway Gen-4.5

New

Runway

Best creative control — motion brushes, references.

Higher price per second.

Generation modelrequest access →

Ray 3.14

New

Luma AI

Fast iteration, photoreal output.

Shorter clip length.

Generation modelrequest access →

Kling 3.0

New

Kuaishou

Long-form, realistic motion.

Slower generation.

Generation modelrequest access →

Seedance 2.0

New

ByteDance

Strong stylized motion with audio.

Less granular prompt control.

Generation modelrequest access →

Hailuo 2.3

MiniMax

Fluid camera moves; cheap.

Inconsistent character identity.

Generation modelrequest access →

Pika 2.5

Pika Labs

Fun edit effects, fast turnaround.

Not photoreal.

Generation modelrequest access →

Wan 2.6

New

Alibaba

Open-weights video model.

Quality below state-of-the-art.

Generation modelrequest access →

HunyuanVideo 1.5

New

Tencent

Open Tencent model; large-scale generation.

Heavy compute to run.

Generation modelrequest access →

LTX-2

New

Lightricks

Real-time open-weights video.

Lower fidelity than premium peers.

Generation modelrequest access →

SkyReels V3

New

Skywork AI

Long, narrative-driven clips.

Niche use cases.

Generation modelrequest access →

CogVideoX 5B

Zhipu AI

Lightweight open video model.

Older generation; limited fidelity.

Generation modelrequest access →

FLUX.2 Pro

New

Black Forest Labs

Photoreal SOTA with sharp text rendering.

Premium pricing; slower than 1 Pro.

Generation modelrequest access →

FLUX.1 Pro

Black Forest Labs

Crisp realism, tight prompt adherence.

Eclipsed by 2 Pro.

Generation modelrequest access →

Midjourney v7

New

Midjourney

Best artistic style and composition.

Discord/web-only API surface.

Generation modelrequest access →

Ideogram 3.0

New

Ideogram

Reliable text rendering inside images.

Less photoreal than FLUX.

Generation modelrequest access →

GPT Image 1

New

OpenAI

Native ChatGPT integration; great instruction following.

Tighter content rules.

Generation modelrequest access →

Imagen 4

New

Google

Photoreal Google quality with strong prompt fidelity.

Restrictive content filters.

Generation modelrequest access →

Adobe Firefly 5

New

Adobe

Commercially safe; tight Adobe tooling integration.

Conservative outputs.

Generation modelrequest access →

Stable Diffusion 3.5

Stability AI

Open weights; fully customizable pipeline.

Quality below frontier closed models.

Generation modelrequest access →

Suno v5.5

New

Suno

Full songs with vocals and structure.

Limited fine-grained control.

Generation modelrequest access →

Udio

Studio-quality vocal generation.

Smaller catalog of styles.

Generation modelrequest access →

MiniMax Music 2.5

New

MiniMax

Strong instrumentals plus vocals.

English-leaning training data.

Generation modelrequest access →

Mureka V8

New

Mureka

Multilingual song output.

Less mainstream tooling.

Generation modelrequest access →

ElevenLabs Music

New

ElevenLabs

Crisp production quality.

Newer; smaller style library.

Generation modelrequest access →

Stable Audio 2.5

Stability AI

Open audio diffusion model.

Short max output length.

Generation modelrequest access →

ElevenLabs v3

New

ElevenLabs

Most expressive TTS on the market.

Latency on long inputs.

Generation modelrequest access →

Eleven Turbo v2.5

ElevenLabs

Low-latency real-time TTS.

Less expressive than v3.

Generation modelrequest access →

Voxtral TTS

New

Mistral

Open multilingual TTS.

Smaller voice library.

Generation modelrequest access →

Dia

New

Nari Labs

Open expressive dialogue voices.

One voice per checkpoint.

Generation modelrequest access →

Kokoro v1.0

Open Source

Tiny open TTS — runs anywhere.

Limited voice variety.

Generation modelrequest access →

Fish Audio S2 Pro

New

Fish Audio

High-fidelity voice cloning.

Less English coverage than Eleven.

Generation modelrequest access →

P

PlayDialog

Play.ht

Natural multi-speaker dialogue.

Fewer base voices.

Generation modelrequest access →