Grok

Grok 4.20

GrokBalanced
ThinkingTool UseVisionStructured Output

About this model

Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently...

Performance Tier

Balanced

Grok 4.20 is a balanced model from Grok : strong performance at a reasonable price.

Strong cost-performance ratio. Reliable for most professional use cases without premium pricing.

Pricing

This model is included in Elosia plans
Affordable

Low cost. Suitable for sustained use and high-volume interactions.

Typeper 1M tokens
Input (prompt)$1.25
Output (completion)$2.50
Cache read$0.200

Capabilities

Context Length2.0M
Max Output Tokens
TokenizerGrok
Inputtext, image, file
Outputtext
Release DateMarch 31, 2026

Benchmarks

General Intelligence
MMLU
91.2%
GPQA Diamond
88.5%
Mathematics
MATH-500
Not reported
AIME 2025
93%
Programming
HumanEval
Not reported
SWE-bench Verified
81%
LiveCodeBench
79.4%
Reasoning
IFEval
83%
ARC-AGI-2
15.9%
Humanity's Last Exam
35%

Recommended Use Cases

CodingAnalysisResearchGeneral ChatCreative Writing

Strengths

  • 4-agent internal architecture reduces hallucinations by 65% — #1 on AA-Omniscience (78%)
  • Industry-leading 2M token context window for massive document analysis
  • Native multimodal understanding (text, image, video) with real-time X data access
  • #1 on IFBench (83%) and #2 on tau2-Bench (97%) for instruction following and agentic tool use

Limitations

  • Verbose output (~54M tokens during evaluation vs ~13M average) — higher cost per query
  • Weak abstract reasoning (ARC-AGI-2: 15.9%) compared to top competitors
  • Smaller third-party ecosystem than OpenAI/Anthropic

Resources

This model may use your data for training

Similar Models