Medium 3.5

MistralFlagship

ThinkingTool UseVisionStructured Output

About this model

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

Performance Tier

Flagship

Medium 3.5 is a flagship model from Mistral : the most capable in their lineup.

Best-in-class model from this provider. Highest performance across benchmarks, ideal for demanding tasks.

Pricing

This model is included in Elosia plans

Moderate

Moderate cost. A balanced choice for regular use without constant cap watching.

Type	per 1M tokens
Input (prompt)	$1.50
Output (completion)	$7.50

Capabilities

Context Length262K

Max Output Tokens—

TokenizerMistral

Inputtext, image, file

Outputtext

Release DateApril 30, 2026

Benchmarks

General Intelligence

MMLU

Not reported

GPQA Diamond

Not reported

Mathematics

MATH-500

Not reported

AIME 2025

86.3%

Programming

HumanEval

Not reported

SWE-bench Verified

77.6%

Reasoning

IFEval

Not reported

Recommended Use Cases

CodingAnalysisMathematicsGeneral Chat

Strengths

Top-tier coding agentic performance: 77.6% on SWE-bench Verified, ahead of Claude Sonnet 4.5 (77.2%) and Devstral 2 (72.2%)
Excellent math reasoning: 86.3% AIME 2025 avg@16, on par with Claude Sonnet 4.5/4.6 (86.7/86.9%)
Outstanding instruction following: 95.8% on Collie, surpassing Claude Sonnet 4.5 (90.5%) and competing top models
Best-in-class agentic tool-use across τ³ benchmarks: 91.4% Telecom, 76.1% Retail, 72.0% Airline
256K context window, native multimodal vision, and configurable reasoning mode
Open weights under modified MIT license, EU-based provider for data sovereignty

Limitations

No published scores on MMLU, GPQA Diamond, MATH-500, or HumanEval, limiting comparison on classic general reasoning benchmarks
Web browsing is the weakest axis: 48.6% on BrowseComp, well behind Qwen3.5 (78.6%), GLM-5 (74.9%), and Kimi K2.5 (74.7%)
Verbose outputs (~5× the median token count) inflate cost per query
Premium pricing for an open-weight model: $1.50 / $7.50 per million input/output tokens

Resources

Official Documentation Benchmark Results

This model may use your data for training

Similar Models

Claude

Claude

DeepSeek

DeepSeek