Xiaomi

MiMo v2.5

XiaomiBalanced
ThinkingTool UseVisionStructured Output

About this model

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...

Performance Tier

Balanced

MiMo v2.5 is a balanced model from Xiaomi : strong performance at a reasonable price.

Strong cost-performance ratio. Reliable for most professional use cases without premium pricing.

Pricing

This model is included in Elosia plans
Eco

Minimal cost. Ideal for very high volume or simple tasks.

Typeper 1M tokens
Input (prompt)$0.140
Output (completion)$0.280
Cache read$0.0028

Capabilities

Context Length1.0M
Max Output Tokens131K
TokenizerOther
Inputtext, audio, image, video
Outputtext
Release DateApril 22, 2026

Benchmarks

General Intelligence
MMLU
Not reported
Mathematics
MATH-500
Not reported
Programming
HumanEval
Not reported
Multimodal
MMMU-Pro
77.9%
Agentic
SWE-bench Pro
56.1%
Terminal-Bench 2.0
65.8%

Recommended Use Cases

AnalysisResearchCodingData Extraction

Strengths

  • Native omni-modal input — image, video and audio understanding in one open-weight model, uncommon in its price band
  • MoE architecture (310B total, 15B active), open-weight under a permissive MIT license — self-hostable
  • Very low pricing at $0.14/M input and $0.28/M output — a fraction of comparable multimodal models
  • 1M-token context with native long-context support across modalities
  • Strong self-reported document and multimodal understanding (MMMU-Pro 77.9)

Limitations

  • Below the MiMo-V2.5-Pro flagship on hard coding and reasoning — positioned as the cheaper multimodal sibling
  • Standard text benchmarks published only as chart images; independent per-benchmark verification is still limited
  • Smaller ecosystem and tooling maturity than established Western multimodal providers

Resources

This model may use your data for training

Similar Models