← Back
OpenAI
OpenAI releases GPT-5.3-Codex with agentic coding capabilities; achieves new SWE-Bench Pro high and 25% faster performance
OpenAI APIOpenAI · releasefeaturemodelapi · openai.com ↗

New Agentic Coding Frontier

OpenAI has introduced GPT-5.3-Codex, marking a significant advancement in AI-assisted software development. The model combines the coding capabilities of GPT-5.2-Codex with the reasoning and professional knowledge of GPT-5.2, operating 25% faster while handling complex, long-running tasks. Unlike its predecessors, GPT-5.3-Codex can work autonomously on extended projects while remaining interactive—developers can steer and provide feedback without losing context.

Notably, GPT-5.3-Codex played an instrumental role in its own development. The Codex team used early versions to debug training processes, manage deployments, and diagnose test results, demonstrating the model's ability to accelerate its own improvement cycle.

Benchmark Performance and Capabilities

GPT-5.3-Codex achieves state-of-the-art performance across multiple benchmarks:

  • SWE-Bench Pro: Sets new industry high for real-world software engineering tasks across four programming languages, with improved contamination resistance and industry relevance
  • Terminal-Bench 2.0: Achieves 77.3% accuracy (vs. 64.0% for GPT-5.2-Codex), measuring terminal skills critical for coding agents
  • Token Efficiency: Delivers stronger performance while consuming fewer tokens, enabling users to build more within usage limits

Web Development and Long-Running Tasks

GPT-5.3-Codex demonstrates striking capabilities in web development. In testing, the model autonomously iterated on complex games over millions of tokens, building fully functional applications from scratch. It also shows improved intent understanding for day-to-day website development, with better defaults for aesthetic choices, functional layouts, and production-ready designs.

Developer Action Items: The model is available via the Codex app waitlist. Developers can now delegate longer-horizon tasks, from multi-day projects to complex tool integration workflows, with improved autonomy and reasoning capabilities.