Grok 4.20

GrokBalanced

ThinkingTool UseVisionStructured Output

About this model

Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering...

Performance Tier

Balanced

Grok 4.20 is a balanced model from Grok : strong performance at a reasonable price.

Strong cost-performance ratio. Reliable for most professional use cases without premium pricing.

Pricing

This model is included in Elosia plans

Affordable

Low cost. Suitable for sustained use and high-volume interactions.

Type	per 1M tokens
Input (prompt)	$1.25
Output (completion)	$2.50
Cache read	$0.200

Capabilities

Context Length2.0M

Max Output Tokens—

TokenizerGrok

Inputtext, image, file

Outputtext

Release DateMarch 31, 2026

Benchmarks

General Intelligence

MMLU

91.2%

GPQA Diamond

88.5%

Mathematics

MATH-500

Not reported

AIME 2025

93%

Programming

HumanEval

Not reported

SWE-bench Verified

81%

LiveCodeBench

79.4%

Reasoning

IFEval

83%

ARC-AGI-2

15.9%

Humanity's Last Exam

35%

Recommended Use Cases

CodingAnalysisResearchGeneral ChatCreative Writing

Strengths

4-agent internal architecture reduces hallucinations by 65% — #1 on AA-Omniscience (78%)
Industry-leading 2M token context window for massive document analysis
Native multimodal understanding (text, image, video) with real-time X data access
#1 on IFBench (83%) and #2 on tau2-Bench (97%) for instruction following and agentic tool use

Limitations

Verbose output (~54M tokens during evaluation vs ~13M average) — higher cost per query
Weak abstract reasoning (ARC-AGI-2: 15.9%) compared to top competitors
Smaller third-party ecosystem than OpenAI/Anthropic

Resources

Official Documentation Benchmark Results LM Arena Leaderboard

This model may use your data for training

Similar Models

Grok

Grok

Grok

Claude