Llama 4

Maverick

Llama 4Flagship
Tool UseVisionStructured Output

About this model

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

Performance Tier

Flagship

Maverick is a flagship model from Llama 4 : the most capable in their lineup.

Best-in-class model from this provider. Highest performance across benchmarks, ideal for demanding tasks.

Pricing

This model is included in Elosia plans
Typeper 1M tokens
Input (prompt)$0.150
Output (completion)$0.600

Capabilities

Context Length1.0M
Max Output Tokens16K
TokenizerLlama4
Inputtext, image
Outputtext
Release DateApril 5, 2025

Benchmarks

General Intelligence
MMLU
88.2%
MMLU-Pro
80.5%
GPQA Diamond
69.8%
Mathematics
MATH-500
85%
Programming
HumanEval
89.5%
SWE-bench Verified
55%
Reasoning
IFEval
88.8%

Recommended Use Cases

CodingAnalysisGeneral ChatCreative WritingResearch

Strengths

  • Most capable open-weight model from Meta
  • Strong general reasoning and coding (HumanEval 89.5%)
  • Native multimodal support (text + image)
  • Open-weight with commercially permissive license

Limitations

  • Lags behind top proprietary models on science reasoning (GPQA 69.8%)
  • Large model — requires significant compute for self-hosting

Resources

This model may use your data for training

Similar Models