Gemini 3.1 Flash Lite

GeminiCompact

ThinkingTool UseVisionStructured Output

About this model

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

Performance Tier

Compact

Gemini 3.1 Flash Lite is a compact model from Gemini : optimized for speed and affordability.

Small, fast, and affordable. Optimized for speed and low cost, great for high-volume or simple tasks.

Pricing

This model is included in Elosia plans

Affordable

Low cost. Suitable for sustained use and high-volume interactions.

Type	per 1M tokens
Input (prompt)	$0.250
Output (completion)	$1.50
Image	$0.250
Internal reasoning	$1.50
Cache read	$0.025
Cache write	$0.083

Capabilities

Context Length1.0M

Max Output Tokens66K

TokenizerGemini

Inputtext, image, video, file, audio

Outputtext

Release DateMarch 3, 2026

Benchmarks

General Intelligence

MMLU

88.9%

GPQA Diamond

86.9%

Mathematics

MATH-500

Not reported

Programming

HumanEval

Not reported

Reasoning

Humanity's Last Exam

16%

Multimodal

MMMU-Pro

76.8%

Recommended Use Cases

General ChatSummarizationData ExtractionCustomer SupportTranslation

Strengths

Ultra-fast inference (~389 tokens/s) at half the cost of Flash
1M token context window with full multimodal input (text, image, video, audio)
Strong general knowledge (MMLU 88.9%) for a lite model
Built-in thinking mode with configurable depth

Limitations

Preview model — behavior may change
Complex reasoning significantly weaker than Flash base (HLE 16% vs 43.5%)
Poor long-context performance at 1M tokens (MRCR 12.3%)

Resources

Official Documentation model-card

This model may use your data for training

Similar Models

Gemini

Gemini

Claude

Cohere