Mistral

Medium 3.5

MistralFlagship
ThinkingTool UseVisionStructured Output

About this model

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

Performance Tier

Flagship

Medium 3.5 is a flagship model from Mistral : the most capable in their lineup.

Best-in-class model from this provider. Highest performance across benchmarks, ideal for demanding tasks.

Pricing

This model is included in Elosia plans
Moderate

Moderate cost. A balanced choice for regular use without constant cap watching.

Typeper 1M tokens
Input (prompt)$1.50
Output (completion)$7.50

Capabilities

Context Length262K
Max Output Tokens
TokenizerMistral
Inputtext, image
Outputtext
Release DateApril 30, 2026

Benchmarks

General Intelligence
MMLU
Not reported
GPQA Diamond
Not reported
Mathematics
MATH-500
Not reported
AIME 2025
86.3%
Programming
HumanEval
Not reported
SWE-bench Verified
77.6%
Reasoning
IFEval
Not reported

Recommended Use Cases

CodingAnalysisMathematicsGeneral Chat

Strengths

  • Top-tier coding agentic performance: 77.6% on SWE-bench Verified, ahead of Claude Sonnet 4.5 (77.2%) and Devstral 2 (72.2%)
  • Excellent math reasoning: 86.3% AIME 2025 avg@16, on par with Claude Sonnet 4.5/4.6 (86.7/86.9%)
  • Outstanding instruction following: 95.8% on Collie, surpassing Claude Sonnet 4.5 (90.5%) and competing top models
  • Best-in-class agentic tool-use across τ³ benchmarks: 91.4% Telecom, 76.1% Retail, 72.0% Airline
  • 256K context window, native multimodal vision, and configurable reasoning mode
  • Open weights under modified MIT license, EU-based provider for data sovereignty

Limitations

  • No published scores on MMLU, GPQA Diamond, MATH-500, or HumanEval, limiting comparison on classic general reasoning benchmarks
  • Web browsing is the weakest axis: 48.6% on BrowseComp, well behind Qwen3.5 (78.6%), GLM-5 (74.9%), and Kimi K2.5 (74.7%)
  • Verbose outputs (~5× the median token count) inflate cost per query
  • Premium pricing for an open-weight model: $1.50 / $7.50 per million input/output tokens

Resources

This model may use your data for training

Similar Models