New Compact Models for Speed and Efficiency
OpenAI has released GPT-5.4 mini and GPT-5.4 nano, optimized variants of GPT-5.4 designed for developers building latency-sensitive applications. GPT-5.4 mini delivers significant performance improvements over its GPT-5 predecessor across coding, reasoning, multimodal understanding, and tool use, while maintaining response times more than 2x faster. GPT-5.4 nano is the smallest and most cost-effective variant, suitable for classification, data extraction, ranking, and subagent tasks.
Performance Benchmarks
Both models deliver impressive results relative to their size:
- GPT-5.4 mini achieves 54.4% accuracy on SWE-Bench Pro (a software engineering benchmark), compared to 57.7% for full GPT-5.4 and 45.7% for GPT-5 mini
- GPT-5.4 nano reaches 52.4% on the same benchmark, with strong performance on Terminal-Bench 2.0 (46.3%) and multimodal tasks
- Both models excel at tool-calling and computer use, with GPT-5.4 mini achieving 72.1% accuracy on OSWorld-Verified (computer UI automation tasks)
Pricing and Availability
GPT-5.4 mini is available across multiple platforms:
- API: $0.75 per 1M input tokens, $4.50 per 1M output tokens, with 400k context window, supporting text, images, tool use, function calling, web search, file search, and computer use
- Codex: Uses only 30% of GPT-5.4 quota, enabling cheaper handling of simpler coding tasks and enabling efficient subagent delegation
- ChatGPT: Available to Free and Go users via the Thinking feature; fallback option for other users running GPT-5.4 Thinking
GPT-5.4 nano is API-only: $0.20 per 1M input tokens, $1.25 per 1M output tokens.
Use Cases and Architecture Patterns
The release enables new architectural patterns where larger models handle planning and coordination while delegating narrower subtasks to GPT-5.4 mini subagents running in parallel. This is particularly effective for coding assistants requiring responsive interaction, multi-step reasoning systems, and screenshot-based computer use applications. The models are optimized for scenarios where latency directly impacts user experience, making them ideal for real-time code completion, quick document processing, and rapid task execution at scale.