← Back
OpenAI
OpenAI releases GPT-5.4 mini and nano; smaller models match GPT-5.4 performance on coding at 2x faster speeds
OpenAI APIChatGPTOpenAI · releasemodelfeatureapiperformance · openai.com ↗

Two New Compact Models for Speed-Critical Applications

OpenAI has released GPT-5.4 mini and GPT-5.4 nano, two new smaller models optimized for workloads where latency and cost are paramount. These models bring much of GPT-5.4's capability to faster, more efficient packages suitable for coding assistants, subagent systems, and multimodal applications that require real-time responsiveness.

Performance and Benchmarks

GPT-5.4 mini significantly improves over GPT-5 mini across multiple dimensions:

  • Coding: Achieves 54.4% accuracy on SWE-Bench Pro (vs. 45.7% for GPT-5 mini), approaching GPT-5.4's 57.7%
  • Tool use: Scores 42.9% on Toolathlon and 57.7% on MCP Atlas
  • Speed: Runs more than 2x faster than GPT-5 mini at similar latency
  • Computer use: Reaches 72.1% on OSWorld-Verified, nearly matching GPT-5.4's 75.0%

GPT-5.4 nano serves as the smallest and cheapest option:

  • Recommended for classification, data extraction, ranking, and simpler coding subagents
  • Significant upgrade over GPT-5 nano across all evaluated tasks

Availability and Pricing

GPT-5.4 mini is available today across three platforms:

  • API: Supports text/image inputs, tool use, function calling, web search, file search, computer use, and skills. 400k context window. $0.75 per 1M input tokens and $4.50 per 1M output tokens
  • Codex: Uses only 30% of GPT-5.4 quota, enabling cheaper coding task handling. Available across Codex app, CLI, IDE extensions, and web
  • ChatGPT: Available to Free and Go users via the "Thinking" feature; other users receive it as a fallback for GPT-5.4 Thinking

GPT-5.4 nano is API-only: $0.20 per 1M input tokens and $1.25 per 1M output tokens

Use Cases and Architecture Patterns

These models excel in latency-sensitive scenarios where response time directly impacts user experience. Developers can now compose multi-model systems where larger models handle planning and coordination while delegating focused subtasks to GPT-5.4 mini subagents running in parallel—for example, having GPT-5.4 manage overall strategy while mini models search codebases, review files, or process supporting documents simultaneously. This pattern scales more efficiently as smaller models improve in both speed and capability.