Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric workloads, with particular strengths in coding, office and productivity tasks,...
Performance Tier
Flagship
Qwen 3.7 Max is a flagship model from Qwen : the most capable in their lineup.
Best-in-class model from this provider. Highest performance across benchmarks, ideal for demanding tasks.
Pricing
This model is included in Elosia plans
Moderate
Moderate cost. A balanced choice for regular use without constant cap watching.
Type
per 1M tokens
Input (prompt)
$2.50
Output (completion)
$7.50
Cache write
$3.13
Capabilities
Context Length1.0M
Max Output Tokens66K
TokenizerQwen
Inputtext
Outputtext
Release DateMay 21, 2026
Benchmarks
General Intelligence
MMLU
Not reported
GPQA Diamond
92.4%
Mathematics
MATH-500
Not reported
Programming
HumanEval
Not reported
SWE-bench Verified
80.4%
SWE-bench Multilingual
78.3%
Reasoning
IFEval
Not reported
Humanity's Last Exam
41.4%
Agentic
SWE-bench Pro
60.6%
Terminal-Bench 2.0
69.7%
Recommended Use Cases
CodingAnalysisResearchGeneral ChatTranslation
Strengths
Alibaba's most capable agent model — leads SWE-bench Verified (80.4%) on par with Claude Opus 4.6
Top-tier scientific reasoning (GPQA Diamond 92.4%) — ahead of Claude Opus 4.6 (91.3%)
Strong agentic coding across languages (SWE-bench Multilingual 78.3%, SWE-bench Pro 60.6%)
Leads Terminal-Bench 2.0 (69.7%) for long-horizon command-line workflows
Built-in thinking mode (enable_thinking / preserve_thinking) for chain-of-thought reasoning
Limitations
Alibaba's evaluation focuses on agentic and coding benchmarks — MMLU, MATH-500, HumanEval, AIME and LiveCodeBench are not part of the reported suite
Native context advertised at 256K tokens (1M only mentioned for some configurations)
Premium pricing among Qwen models ($2.50 / $7.50 per M tokens)