OpenAI releases GPT-5.4, achieving 83% parity with professional knowledge workers

Key Capabilities

OpenAI has released GPT-5.4 across ChatGPT (as GPT-5.4 Thinking), the API, and Codex. The model represents a significant step forward in professional knowledge work, combining advances in reasoning, coding, and agentic workflows.

GPT-5.4's core improvements include:

Computer-use capabilities: GPT-5.4 is the first general-purpose model with native, state-of-the-art computer-use capabilities. Agents can now operate computers and execute complex workflows across applications, supporting up to 1M tokens of context.
Mid-response adjustment in ChatGPT: GPT-5.4 Thinking can now provide an upfront plan of its thinking, allowing users to adjust course mid-response and arrive at better outputs without additional turns.
Improved factuality: GPT-5.4's individual claims are 33% less likely to be false and full responses are 18% less likely to contain any errors compared to GPT-5.2.
Token efficiency: It's the most token-efficient reasoning model yet, using significantly fewer tokens to solve problems, resulting in reduced costs and faster speeds.

Professional Knowledge Work

GPT-5.4 achieves state-of-the-art performance on knowledge work tasks. On the GDPval benchmark—which tests agents' abilities across 44 occupations in the top 9 GDP-contributing industries—GPT-5.4 matches or exceeds industry professionals in 83.0% of comparisons, up from 70.9% for GPT-5.2.

The model demonstrates particular strength in professional document and spreadsheet work:

Spreadsheets: Achieves 87.3% on internal benchmarks for spreadsheet modeling tasks, compared to 68.4% for GPT-5.2.
Presentations: Human raters preferred GPT-5.4-generated presentations 68.0% of the time over GPT-5.2 due to superior aesthetics and visual variety.
Deep web research: Improved particularly for highly specific queries while maintaining context across longer thinking sessions.

Agent and Developer Capabilities

GPT-5.4 introduces tool search functionality, helping agents find and use the right tools more efficiently without sacrificing intelligence. The model excels at writing code to operate computers via libraries like Playwright and can issue mouse and keyboard commands in response to screenshots.

Developers can steer the model's behavior via developer messages and configure custom confirmation policies to match their risk tolerance. On the OSWorld-Verified benchmark—which measures computer-use capabilities—GPT-5.4 achieves 75.0% compared to 47.3% for GPT-5.2.

Availability

GPT-5.4 and GPT-5.4 Pro are available now in ChatGPT and the API. Enterprise customers can access the new ChatGPT for Excel add-in, also launched today. Updated spreadsheet and presentation skills are available in Codex and the API.

Key Capabilities

Professional Knowledge Work

Agent and Developer Capabilities

Availability

Products

Tags

Published

Source

Related News