The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
Performance Tier
Balanced
Qwen 3.5 Flash is a balanced model from Qwen : strong performance at a reasonable price.
Strong cost-performance ratio. Reliable for most professional use cases without premium pricing.
Pricing
This model is included in Elosia plans
Type
per 1M tokens
Input (prompt)
$0.065
Output (completion)
$0.260
Capabilities
Context Length1.0M
Max Output Tokens66K
TokenizerQwen3
Inputtext, image, video
Outputtext
Release DateFebruary 25, 2026
Benchmarks
General Intelligence
MMLU
Not reported
GPQA Diamond
84.2%
Mathematics
MATH-500
Not reported
Programming
HumanEval
Not reported
SWE-bench Verified
69.2%
Reasoning
IFEval
91.9%
Multimodal
MMMU-Pro
75.1%
Recommended Use Cases
General ChatCodingAnalysisResearchTranslation
Strengths
Frontier-class performance with only 3B active parameters (MoE)
Extremely affordable ($0.10/M input) with near-flagship quality