Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.
Performance Tier
Compact
Gemini 3.1 Flash Lite is a compact model from Gemini : optimized for speed and affordability.
Small, fast, and affordable. Optimized for speed and low cost, great for high-volume or simple tasks.
Pricing
This model is included in Elosia plans
Type
per 1M tokens
Input (prompt)
$0.250
Output (completion)
$1.50
Image
$0.250
Internal reasoning
$1.50
Cache read
$0.025
Cache write
$0.083
Capabilities
Context Length1.0M
Max Output Tokens66K
TokenizerGemini
Inputtext, image, video, file, audio
Outputtext
Release DateMarch 3, 2026
Benchmarks
General Intelligence
MMLU
88.9%
GPQA Diamond
86.9%
Mathematics
MATH-500
Not reported
Programming
HumanEval
Not reported
Reasoning
Humanity's Last Exam
16%
Multimodal
MMMU-Pro
76.8%
Recommended Use Cases
General ChatSummarizationData ExtractionCustomer SupportTranslation
Strengths
Ultra-fast inference (~389 tokens/s) at half the cost of Flash
1M token context window with full multimodal input (text, image, video, audio)
Strong general knowledge (MMLU 88.9%) for a lite model
Built-in thinking mode with configurable depth
Limitations
Preview model — behavior may change
Complex reasoning significantly weaker than Flash base (HLE 16% vs 43.5%)
Poor long-context performance at 1M tokens (MRCR 12.3%)