Gemini

Gemini 3.1 Flash Lite

GeminiCompact
ThinkingTool UseVisionStructured Output

About this model

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

Performance Tier

Compact

Gemini 3.1 Flash Lite is a compact model from Gemini : optimized for speed and affordability.

Small, fast, and affordable. Optimized for speed and low cost, great for high-volume or simple tasks.

Pricing

This model is included in Elosia plans
Typeper 1M tokens
Input (prompt)$0.250
Output (completion)$1.50
Image$0.250
Internal reasoning$1.50
Cache read$0.025
Cache write$0.083

Capabilities

Context Length1.0M
Max Output Tokens66K
TokenizerGemini
Inputtext, image, video, file, audio
Outputtext
Release DateMarch 3, 2026

Benchmarks

General Intelligence
MMLU
88.9%
GPQA Diamond
86.9%
Mathematics
MATH-500
Not reported
Programming
HumanEval
Not reported
Reasoning
Humanity's Last Exam
16%
Multimodal
MMMU-Pro
76.8%

Recommended Use Cases

General ChatSummarizationData ExtractionCustomer SupportTranslation

Strengths

  • Ultra-fast inference (~389 tokens/s) at half the cost of Flash
  • 1M token context window with full multimodal input (text, image, video, audio)
  • Strong general knowledge (MMLU 88.9%) for a lite model
  • Built-in thinking mode with configurable depth

Limitations

  • Preview model — behavior may change
  • Complex reasoning significantly weaker than Flash base (HLE 16% vs 43.5%)
  • Poor long-context performance at 1M tokens (MRCR 12.3%)

Resources

This model may use your data for training

Similar Models