Grok 4.3
New1M context, vision, competitive pricing vs frontier.
Newer — less community benchmarking than GPT/Claude.
Hundreds of models across every major provider. Each entry shows where it shines and where it struggles — so the router's pick is never a black box.
1M context, vision, competitive pricing vs frontier.
Newer — less community benchmarking than GPT/Claude.
Built-in chain-of-thought at near-flagship price.
Slower than non-reasoning variants.
Optimised for tool-calling and multi-agent loops.
Niche use-case; general tasks better on standard variants.
Fast general-purpose at grok-3 price bracket.
No reasoning trace; raw quality below frontier.
Top multimodal generalist with native audio in/out.
Highest output price in its tier.
Quick turnaround with vision support.
Context capped at 256k.
Real-time X/Twitter context plus solid vision.
Pricier per token than grok-4.x series.
Frontier reasoning, agentic coding, 1M-token context.
Premium price; slower throughput than Sonnet/Haiku.
Wide 1M+ context, strong all-rounder at half the cost.
Slower than Flash-class competitors.
Best long-context multimodal, including video understanding.
Preview-stage; quotas and edge-case quirks.
Hard math, science, planning — deep reasoning.
Expensive and slow; not for chat.
Budget preview channel; cheapest xAI model.
Early preview — may change without notice.
Balanced flagship — most use cases at a sane price.
Trails Opus on the hardest reasoning tasks.
State-of-the-art coding for the price.
Slow output; text-only.
Open Chinese-English flagship with solid reasoning.
Limited multimodal support.
Cost-efficient chain-of-thought reasoning.
No image input; thinks before answering.
Open-weights with 1M context and image support.
Slower hosted speeds than closed peers.
Free open-weights with strong reasoning.
Trails 5.1 on benchmarks.
Long-context Chinese flagship with vision.
Less common in Western tooling.
Open agentic reasoning model.
Text-only; smaller community.
Native voice + image, fast multimodal pipelines.
Outclassed on quality by the 5.x line.
Cheap, fast coding-tuned model.
Trails Pro on hard reasoning.
Open MoE with near-frontier reasoning.
Heavy to host; slower output.
Highest throughput at this quality tier.
Reasoning ceiling vs the Pro variant.
Open-weights flagship — full multimodal incl. video.
Hardware-hungry to serve at full quality.
Mature 1M context with full audio + video understanding.
Outclassed by the 3.x generation.
Massive 10M token context — needle-in-haystack king.
Quality below Maverick on dense reasoning.
European flagship with strong vision.
Expensive vs comparable open peers.
MoE efficiency at near-31B quality.
All experts must fit in memory at load time.
Solid multimodal with real-time web context.
Behind the frontier on raw reasoning.
Strong open multilingual model with vision.
Moderate speed for its size.
Open reasoning model with visible chain-of-thought.
Short context; slow generation.
Tiny model with surprising reasoning chops.
Only 16K context.
Live web search baked into every reply.
Text-only; quality bound by search results.
Solid open vision-text baseline.
No audio/video; older generation.
Low-cost Grok variant; good for simple tasks.
Text-only; limited reasoning depth.
Cheap, fast Claude with image understanding.
Limited deep reasoning vs Sonnet/Opus.
Smallest fully multimodal model — runs on a laptop.
Tiny context; modest quality.
Cheap reasoning baseline for simple jobs.
Text-only; short context.
Tuned for retrieval / RAG and tool use.
Behind the frontier on raw IQ.
Cheapest OpenAI multimodal; great for high-volume tasks.
Quality dips on complex reasoning.
Cheap and very fast for simple multimodal tasks.
Lower output quality on hard prompts.
Edge-class speed with multimodal coverage.
Smaller model loses nuance on hard tasks.
Lightweight, easy to self-host.
Lower quality vs the 27B sibling.
Mature, well-optimized open-weights workhorse.
Text-only; older generation.
Cheap multimodal worker for high-volume jobs.
Quality cap on hard prompts.
Code-specialist; tuned for IDE-grade completion.
Code-only — not a general assistant.
Cheap, fast Grok variant.
Lower quality; text-only.
Fastest hosted Llama via custom LPU silicon.
Tiny 8K context.
Diffusion LLM with extreme inference speed.
New architecture; less proven on hard tasks.
Ultralight; runs on a single consumer GPU.
Reasoning ceiling is low.
Sub-1B model for edge devices.
Very limited capability.
Cinema-grade video with synchronized audio.
Slow generation; premium tier only.
Latest Google video model with audio understanding.
Limited access; quotas apply.
Strong realism with native audio output.
Eclipsed by 3.1.
Best creative control — motion brushes, references.
Higher price per second.
Fast iteration, photoreal output.
Shorter clip length.
Long-form, realistic motion.
Slower generation.
Strong stylized motion with audio.
Less granular prompt control.
Fluid camera moves; cheap.
Inconsistent character identity.
Fun edit effects, fast turnaround.
Not photoreal.
Open-weights video model.
Quality below state-of-the-art.
Open Tencent model; large-scale generation.
Heavy compute to run.
Real-time open-weights video.
Lower fidelity than premium peers.
Long, narrative-driven clips.
Niche use cases.
Lightweight open video model.
Older generation; limited fidelity.
Photoreal SOTA with sharp text rendering.
Premium pricing; slower than 1 Pro.
Crisp realism, tight prompt adherence.
Eclipsed by 2 Pro.
Best artistic style and composition.
Discord/web-only API surface.
Reliable text rendering inside images.
Less photoreal than FLUX.
Native ChatGPT integration; great instruction following.
Tighter content rules.
Photoreal Google quality with strong prompt fidelity.
Restrictive content filters.
Commercially safe; tight Adobe tooling integration.
Conservative outputs.
Open weights; fully customizable pipeline.
Quality below frontier closed models.
Full songs with vocals and structure.
Limited fine-grained control.
Studio-quality vocal generation.
Smaller catalog of styles.
Strong instrumentals plus vocals.
English-leaning training data.
Multilingual song output.
Less mainstream tooling.
Crisp production quality.
Newer; smaller style library.
Open audio diffusion model.
Short max output length.
Most expressive TTS on the market.
Latency on long inputs.
Low-latency real-time TTS.
Less expressive than v3.
Open multilingual TTS.
Smaller voice library.
Open expressive dialogue voices.
One voice per checkpoint.
Tiny open TTS — runs anywhere.
Limited voice variety.
High-fidelity voice cloning.
Less English coverage than Eleven.
Natural multi-speaker dialogue.
Fewer base voices.