OpenAI releases GPT-5.3-Codex-Spark, optimized for real-time coding at 1000+ tokens per second

GPT-5.3-Codex-Spark Launch

OpenAI has released GPT-5.3-Codex-Spark, a new ultra-fast coding model optimized for real-time collaboration and interactive development. This research preview marks the first milestone in OpenAI's partnership with Cerebras, announced in January. The model is specifically designed to feel "near-instant" when making targeted code edits, refactoring logic, and iterating on interfaces—enabling developers to see results immediately.

Performance and Capabilities

Codex-Spark delivers exceptional speed while maintaining strong coding capability:

Throughput: Over 1000 tokens per second on Cerebras' Wafer Scale Engine 3 hardware
Context window: 128k tokens (text-only for this preview)
SWE-Bench Pro: Achieves 50% accuracy on complex software engineering tasks in just 8 minutes (vs. GPT-5.3-Codex at ~18 minutes)
Terminal-Bench 2.0: 58.4% accuracy for agentic command-line tasks

The model complements OpenAI's frontier models by supporting both long-running autonomous tasks and real-time interactive work. Its lightweight default behavior makes minimal, targeted edits without automatically running tests unless requested.

System-Wide Latency Improvements

Beyond the model itself, OpenAI has implemented infrastructure optimizations that benefit all models:

Per-request overhead: Reduced by 80% through persistent WebSocket connections
Per-token overhead: Reduced by 30%
Time-to-first-token: Reduced by 50% via streamlined response streaming and inference stack rewrites

These improvements come from the new WebSocket default path in Responses API, which will roll out to all models in the coming weeks.

Availability and Access

Current availability:

Research preview for ChatGPT Pro users in Codex app, CLI, and VS Code extension
Limited API access for design partners
Separate rate limits during preview; usage doesn't count toward standard API limits

Future roadmap: OpenAI plans to expand access as they optimize performance under real workloads and eventually introduce larger models, longer context windows, and multimodal input capabilities.

Safety and Infrastructure Notes

Codex-Spark includes the same safety training as mainline models and passed cybersecurity evaluations under OpenAI's Preparedness Framework. The model runs on Cerebras' specialized hardware, which complements—not replaces—GPUs for cost-effective broad inference. The architecture allows combining GPUs and Cerebras for optimal performance on specific workloads.

GPT-5.3-Codex-Spark Launch

Performance and Capabilities

System-Wide Latency Improvements

Availability and Access

Safety and Infrastructure Notes

Products

Tags

Published

Source

Related News