OpenAI releases GPT-5.4 with computer-use capabilities and 83% knowledge work accuracy

Overview

OpenAI has released GPT-5.4, a new frontier model designed for professional work and agent automation. The model is available across ChatGPT (as GPT-5.4 Thinking and GPT-5.4 Pro), the OpenAI API, and Codex. GPT-5.4 builds on recent advances in reasoning, coding, and agentic workflows while delivering improvements in token efficiency and real-world task completion.

Key Improvements

Knowledge Work & Professional Tasks

Achieves 83.0% win rate on GDPval benchmark against industry professionals (up from 70.9% for GPT-5.2)
Scores 87.3% on spreadsheet modeling tasks (up from 68.4%), with human raters preferring presentations 68.0% of the time over GPT-5.2
33% fewer false individual claims and 18% fewer responses with any errors compared to GPT-5.2
Improved performance on document-heavy tasks including legal analysis and financial modeling

Computer Use & Vision

First general-purpose model with native computer-use capabilities, enabling agents to operate computers and carry out complex workflows
Achieves 75.0% on OSWorld-Verified benchmark (surpassing human performance at 72.4%, compared to GPT-5.2's 47.3%)
Excellent at writing code via Playwright and issuing mouse/keyboard commands from screenshots
Supports 1M tokens of context for long-horizon task planning and execution

Developer Features

New tool search capability helps agents find and use the right tools more efficiently
Developers can configure custom confirmation policies and safety behavior via developer messages
Most token-efficient reasoning model yet, reducing token usage and improving speeds
Improved performance across large ecosystems of tools and connectors

Availability & Performance

GPT-5.4 is immediately available in ChatGPT and through the API. The model also powers improvements to ChatGPT for Excel (launched today) and includes updated spreadsheet and presentation skills in Codex. Performance improvements span multiple benchmarks: SWE-Bench Pro (57.7%), Toolathlon (54.6%), and BrowseComp (82.7%).

Getting Started

Users can access GPT-5.4 Thinking in ChatGPT, which now provides upfront thinking plans allowing mid-response course correction. API users can integrate the model for computer-use workflows, agent automation, and professional knowledge work. Enterprise customers should review the new ChatGPT for Excel add-in and updated skill libraries.

Overview

Key Improvements

Availability & Performance

Getting Started

Products

Tags

Published

Source

Related News