OpenAI releases GPT-5.4 mini and nano; smaller models match larger model performance at 2x faster speeds and 67% lower costs

New Compact Models for Speed and Efficiency

OpenAI has released GPT-5.4 mini and GPT-5.4 nano, optimized variants of GPT-5.4 designed for developers building latency-sensitive applications. GPT-5.4 mini delivers significant performance improvements over its GPT-5 predecessor across coding, reasoning, multimodal understanding, and tool use, while maintaining response times more than 2x faster. GPT-5.4 nano is the smallest and most cost-effective variant, suitable for classification, data extraction, ranking, and subagent tasks.

Performance Benchmarks

Both models deliver impressive results relative to their size:

GPT-5.4 mini achieves 54.4% accuracy on SWE-Bench Pro (a software engineering benchmark), compared to 57.7% for full GPT-5.4 and 45.7% for GPT-5 mini
GPT-5.4 nano reaches 52.4% on the same benchmark, with strong performance on Terminal-Bench 2.0 (46.3%) and multimodal tasks
Both models excel at tool-calling and computer use, with GPT-5.4 mini achieving 72.1% accuracy on OSWorld-Verified (computer UI automation tasks)

Pricing and Availability

GPT-5.4 mini is available across multiple platforms:

API: $0.75 per 1M input tokens, $4.50 per 1M output tokens, with 400k context window, supporting text, images, tool use, function calling, web search, file search, and computer use
Codex: Uses only 30% of GPT-5.4 quota, enabling cheaper handling of simpler coding tasks and enabling efficient subagent delegation
ChatGPT: Available to Free and Go users via the Thinking feature; fallback option for other users running GPT-5.4 Thinking

GPT-5.4 nano is API-only: $0.20 per 1M input tokens, $1.25 per 1M output tokens.

Use Cases and Architecture Patterns

The release enables new architectural patterns where larger models handle planning and coordination while delegating narrower subtasks to GPT-5.4 mini subagents running in parallel. This is particularly effective for coding assistants requiring responsive interaction, multi-step reasoning systems, and screenshot-based computer use applications. The models are optimized for scenarios where latency directly impacts user experience, making them ideal for real-time code completion, quick document processing, and rapid task execution at scale.

New Compact Models for Speed and Efficiency

Performance Benchmarks

Pricing and Availability

Use Cases and Architecture Patterns

Products

Tags

Published

Source

Related News