OpenAI releases GPT-5.4 mini and nano; smaller models match GPT-5.4 performance on coding at 2x faster speeds

Two New Compact Models for Speed-Critical Applications

OpenAI has released GPT-5.4 mini and GPT-5.4 nano, two new smaller models optimized for workloads where latency and cost are paramount. These models bring much of GPT-5.4's capability to faster, more efficient packages suitable for coding assistants, subagent systems, and multimodal applications that require real-time responsiveness.

Performance and Benchmarks

GPT-5.4 mini significantly improves over GPT-5 mini across multiple dimensions:

Coding: Achieves 54.4% accuracy on SWE-Bench Pro (vs. 45.7% for GPT-5 mini), approaching GPT-5.4's 57.7%
Tool use: Scores 42.9% on Toolathlon and 57.7% on MCP Atlas
Speed: Runs more than 2x faster than GPT-5 mini at similar latency
Computer use: Reaches 72.1% on OSWorld-Verified, nearly matching GPT-5.4's 75.0%

GPT-5.4 nano serves as the smallest and cheapest option:

Recommended for classification, data extraction, ranking, and simpler coding subagents
Significant upgrade over GPT-5 nano across all evaluated tasks

Availability and Pricing

GPT-5.4 mini is available today across three platforms:

API: Supports text/image inputs, tool use, function calling, web search, file search, computer use, and skills. 400k context window. $0.75 per 1M input tokens and $4.50 per 1M output tokens
Codex: Uses only 30% of GPT-5.4 quota, enabling cheaper coding task handling. Available across Codex app, CLI, IDE extensions, and web
ChatGPT: Available to Free and Go users via the "Thinking" feature; other users receive it as a fallback for GPT-5.4 Thinking

GPT-5.4 nano is API-only: $0.20 per 1M input tokens and $1.25 per 1M output tokens

Use Cases and Architecture Patterns

These models excel in latency-sensitive scenarios where response time directly impacts user experience. Developers can now compose multi-model systems where larger models handle planning and coordination while delegating focused subtasks to GPT-5.4 mini subagents running in parallel—for example, having GPT-5.4 manage overall strategy while mini models search codebases, review files, or process supporting documents simultaneously. This pattern scales more efficiently as smaller models improve in both speed and capability.

Two New Compact Models for Speed-Critical Applications

Performance and Benchmarks

Availability and Pricing

Use Cases and Architecture Patterns

Products

Tags

Published

Source

Related News