Grok

Grok 4.20

GrokFlagship
ThinkingTool UseVisionStructured Output

About this model

Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently...

Performance Tier

Flagship

Grok 4.20 is a flagship model from Grok : the most capable in their lineup.

Best-in-class model from this provider. Highest performance across benchmarks, ideal for demanding tasks.

Pricing

This model is included in Elosia plans
Moderate

Moderate cost. A balanced choice for regular use without constant cap watching.

Typeper 1M tokens
Input (prompt)$2.00
Output (completion)$6.00
Cache read$0.200

Capabilities

Context Length2.0M
Max Output Tokens
TokenizerGrok
Inputtext, image, file
Outputtext
Release DateMarch 31, 2026

Benchmarks

General Intelligence
MMLU
91.2%
GPQA Diamond
88.5%
Mathematics
MATH-500
Not reported
AIME 2025
93%
Programming
HumanEval
Not reported
SWE-bench Verified
81%
LiveCodeBench
79.4%
Reasoning
IFEval
83%
ARC-AGI-2
15.9%
Humanity's Last Exam
35%

Recommended Use Cases

CodingAnalysisResearchGeneral ChatCreative Writing

Strengths

  • 4-agent internal architecture reduces hallucinations by 65% — #1 on AA-Omniscience (78%)
  • Industry-leading 2M token context window for massive document analysis
  • Native multimodal understanding (text, image, video) with real-time X data access
  • #1 on IFBench (83%) and #2 on tau2-Bench (97%) for instruction following and agentic tool use

Limitations

  • Verbose output (~54M tokens during evaluation vs ~13M average) — higher cost per query
  • Weak abstract reasoning (ARC-AGI-2: 15.9%) compared to top competitors
  • Smaller third-party ecosystem than OpenAI/Anthropic

Resources

This model may use your data for training

Similar Models