GPT

GPT-5.3 Codex

GPTSpecialized
ThinkingTool UseVisionStructured Output

About this model

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work. Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.

Performance Tier

Specialized

GPT-5.3 Codex is a specialized model from GPT : built for a specific domain.

Domain-specific model. Optimized for a particular task such as code generation, image creation, or web search.

Pricing

This model is included in Elosia plans
Typeper 1M tokens
Input (prompt)$1.75
Output (completion)$14.00
Cache read$0.175

Capabilities

Context Length400K
Max Output Tokens128K
TokenizerGPT
Inputtext, image, file
Outputtext
Release DateFebruary 24, 2026

Benchmarks

General Intelligence
MMLU
93%
GPQA Diamond
81%
Mathematics
MATH-500
96%
Programming
HumanEval
93%
Reasoning
IFEval
94%
Multimodal
MMMU-Pro
64%
Agentic
Terminal-Bench 2.0
77.3%

Recommended Use Cases

CodingAnalysisResearch

Strengths

  • Unifies frontier coding and general reasoning in a single model
  • 25% faster than GPT-5.2-Codex with comparable or better quality
  • Record Terminal-Bench 2.0 score (77.3%) for real-world coding tasks
  • Strong professional knowledge and broad benchmark coverage (MMLU 93%)

Limitations

  • Premium pricing tier
  • Primarily optimized for coding — general conversation may be less natural than GPT-5.2

Resources

This model may use your data for training

Similar Models