DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...
Performance Tier
Balanced
DeepSeek v4 Flash is a balanced model from DeepSeek : strong performance at a reasonable price.
Strong cost-performance ratio. Reliable for most professional use cases without premium pricing.
Pricing
This model is included in Elosia plans
Eco
Minimal cost. Ideal for very high volume or simple tasks.
Type
per 1M tokens
Input (prompt)
$0.140
Output (completion)
$0.280
Cache read
$0.0028
Capabilities
Context Length1.0M
Max Output Tokens384K
TokenizerDeepSeek
Inputtext
Outputtext
Release DateApril 24, 2026
Benchmarks
General Intelligence
MMLU
88.7%
MMLU-Pro
86.2%
GPQA Diamond
88.1%
Mathematics
MATH-500
Not reported
Programming
HumanEval
69.5%
SWE-bench Verified
79%
LiveCodeBench
91.6%
Agentic
Terminal-Bench 2.0
56.9%
Recommended Use Cases
CodingMathematicsAnalysisGeneral Chat
Strengths
Outstanding cost-to-performance — ~3x cheaper than v4 Pro for near-frontier reasoning
MoE 284B total / 13B active — high throughput ideal for agents and coding assistants
1M-token context with the same hybrid sparse attention as v4 Pro
Open-weight MIT — deployable on-prem
Limitations
Lower factual knowledge density than v4 Pro for recall-heavy tasks
Max reasoning mode adds significant latency — not ideal for real-time UX
Smaller MoE — weaker performance on creative writing vs proprietary models