← Back
OpenAI
OpenAI releases GPT-5.4 mini and nano; smaller models match larger model performance at 2x faster speeds and 67% lower costs
OpenAI APIChatGPT · releasemodelfeatureapiperformance · openai.com ↗

New Compact Models for Speed and Efficiency

OpenAI has released GPT-5.4 mini and GPT-5.4 nano, optimized variants of GPT-5.4 designed for developers building latency-sensitive applications. GPT-5.4 mini delivers significant performance improvements over its GPT-5 predecessor across coding, reasoning, multimodal understanding, and tool use, while maintaining response times more than 2x faster. GPT-5.4 nano is the smallest and most cost-effective variant, suitable for classification, data extraction, ranking, and subagent tasks.

Performance Benchmarks

Both models deliver impressive results relative to their size:

  • GPT-5.4 mini achieves 54.4% accuracy on SWE-Bench Pro (a software engineering benchmark), compared to 57.7% for full GPT-5.4 and 45.7% for GPT-5 mini
  • GPT-5.4 nano reaches 52.4% on the same benchmark, with strong performance on Terminal-Bench 2.0 (46.3%) and multimodal tasks
  • Both models excel at tool-calling and computer use, with GPT-5.4 mini achieving 72.1% accuracy on OSWorld-Verified (computer UI automation tasks)

Pricing and Availability

GPT-5.4 mini is available across multiple platforms:

  • API: $0.75 per 1M input tokens, $4.50 per 1M output tokens, with 400k context window, supporting text, images, tool use, function calling, web search, file search, and computer use
  • Codex: Uses only 30% of GPT-5.4 quota, enabling cheaper handling of simpler coding tasks and enabling efficient subagent delegation
  • ChatGPT: Available to Free and Go users via the Thinking feature; fallback option for other users running GPT-5.4 Thinking

GPT-5.4 nano is API-only: $0.20 per 1M input tokens, $1.25 per 1M output tokens.

Use Cases and Architecture Patterns

The release enables new architectural patterns where larger models handle planning and coordination while delegating narrower subtasks to GPT-5.4 mini subagents running in parallel. This is particularly effective for coding assistants requiring responsive interaction, multi-step reasoning systems, and screenshot-based computer use applications. The models are optimized for scenarios where latency directly impacts user experience, making them ideal for real-time code completion, quick document processing, and rapid task execution at scale.