OpenAI releases GPT-5.4 with computer-use capabilities; achieves 83% win rate on professional knowledge work

Overview

OpenAI has released GPT-5.4, the company's most capable frontier model for professional work, now available in ChatGPT (as GPT-5.4 Thinking and GPT-5.4 Pro), the OpenAI API, and Codex. The model combines advances in reasoning, coding, and agentic workflows with native computer-use capabilities—a significant milestone enabling AI agents to operate computers and execute complex workflows across applications.

Key Capabilities and Performance

Knowledge Work & Professional Tasks

Achieves 83.0% win rate on GDPval, matching or exceeding human professionals across 44 occupations
87.3% mean score on spreadsheet modeling (vs. 68.4% for GPT-5.2)
Human raters prefer GPT-5.4 presentations 68% of the time due to superior aesthetics and visual design
33% fewer factual errors and 18% less likely to contain any mistakes compared to GPT-5.2

Computer Use & Vision

Native computer-use capabilities with state-of-the-art 75.0% success rate on OSWorld-Verified (exceeding human performance at 72.4%)
Excellent at writing code for browser/desktop automation via Playwright and similar libraries
Supports 1M token context for agents to plan, execute, and verify tasks across long horizons
Steerable behavior via developer messages with configurable safety policies

Coding & Efficiency

Inherits industry-leading coding capabilities from GPT-5.3-Codex
54.6% on Toolathlon benchmark (new tool-use evaluation)
Most token-efficient reasoning model yet, delivering faster speeds and reduced costs compared to GPT-5.2
Improved tool search functionality helping agents find and use the right tools more efficiently

Developer Impact & Action Items

ChatGPT Users: GPT-5.4 Thinking provides upfront thinking plans, allowing users to adjust course mid-response for more aligned outputs without additional turns. This feature is particularly useful for deep web research and maintaining context across longer conversations.

API & Codex Developers: GPT-5.4 is the first general-purpose model with native computer-use capabilities, enabling reliable agents that operate across websites and software systems. Developers can configure safety behavior via custom confirmation policies to match their risk tolerance.

Enterprise Customers: OpenAI has released a new ChatGPT for Excel add-in for enhanced spreadsheet work, alongside updated spreadsheet and presentation skills available in Codex and the API.

A higher-tier GPT-5.4 Pro variant is also available for users requiring maximum performance on complex tasks.

Overview

Key Capabilities and Performance

Developer Impact & Action Items

Products

Tags

Published

Source

Related News