GPT-5.3-Codex-Spark Launch
OpenAI has released GPT-5.3-Codex-Spark, a new ultra-fast coding model optimized for real-time collaboration and interactive development. This research preview marks the first milestone in OpenAI's partnership with Cerebras, announced in January. The model is specifically designed to feel "near-instant" when making targeted code edits, refactoring logic, and iterating on interfaces—enabling developers to see results immediately.
Performance and Capabilities
Codex-Spark delivers exceptional speed while maintaining strong coding capability:
- Throughput: Over 1000 tokens per second on Cerebras' Wafer Scale Engine 3 hardware
- Context window: 128k tokens (text-only for this preview)
- SWE-Bench Pro: Achieves 50% accuracy on complex software engineering tasks in just 8 minutes (vs. GPT-5.3-Codex at ~18 minutes)
- Terminal-Bench 2.0: 58.4% accuracy for agentic command-line tasks
The model complements OpenAI's frontier models by supporting both long-running autonomous tasks and real-time interactive work. Its lightweight default behavior makes minimal, targeted edits without automatically running tests unless requested.
System-Wide Latency Improvements
Beyond the model itself, OpenAI has implemented infrastructure optimizations that benefit all models:
- Per-request overhead: Reduced by 80% through persistent WebSocket connections
- Per-token overhead: Reduced by 30%
- Time-to-first-token: Reduced by 50% via streamlined response streaming and inference stack rewrites
These improvements come from the new WebSocket default path in Responses API, which will roll out to all models in the coming weeks.
Availability and Access
Current availability:
- Research preview for ChatGPT Pro users in Codex app, CLI, and VS Code extension
- Limited API access for design partners
- Separate rate limits during preview; usage doesn't count toward standard API limits
Future roadmap: OpenAI plans to expand access as they optimize performance under real workloads and eventually introduce larger models, longer context windows, and multimodal input capabilities.
Safety and Infrastructure Notes
Codex-Spark includes the same safety training as mainline models and passed cybersecurity evaluations under OpenAI's Preparedness Framework. The model runs on Cerebras' specialized hardware, which complements—not replaces—GPUs for cost-effective broad inference. The architecture allows combining GPUs and Cerebras for optimal performance on specific workloads.