OpenAI releases GPT-5.4, delivering 83% parity with professionals on knowledge work

GPT-5.4: A New Standard for Professional AI

OpenAI has released GPT-5.4, alongside a premium GPT-5.4 Pro tier, marking a major advancement in frontier AI capabilities. The model is now available across ChatGPT (as GPT-5.4 Thinking), the OpenAI API, and Codex. GPT-5.4 integrates the company's recent breakthroughs in reasoning, coding, and agentic workflows into a single model optimized for professional work at scale.

Knowledge Work and Accuracy

On the GDPval benchmark—which evaluates agents on knowledge work tasks spanning 44 occupations and real deliverables like sales presentations, accounting spreadsheets, and manufacturing diagrams—GPT-5.4 achieves 83.0% win rate against industry professionals, up significantly from GPT-5.2's 70.9%. The model shows particularly strong performance on professional documents:

Spreadsheets: 87.3% mean score on junior investment banking analyst tasks (vs. 68.4% for GPT-5.2)
Presentations: 68% human preference rate for visual quality, aesthetics, and image generation
Factuality: 33% reduction in false claims and 18% reduction in error-containing responses versus GPT-5.2

Enterprise users can now leverage the new ChatGPT for Excel add-in, launched today alongside updated spreadsheet and presentation skills for Codex and the API.

Computer Use and Agentic Capabilities

GPT-5.4 is the first general-purpose OpenAI model with native computer-use capabilities, enabling agents to operate computers and automate complex workflows across applications. Key technical improvements include:

75.0% success rate on OSWorld-Verified (desktop navigation via screenshots and keyboard/mouse), exceeding human performance at 72.4%
1M token context window for long-horizon planning and task execution
Tool search for efficient agent discovery and use of the right integrations without sacrificing reasoning quality
Steerable behavior via developer messages, with configurable safety policies for different risk tolerances

The model excels at writing code via libraries like Playwright and executing complex multi-step workflows across software ecosystems.

Efficiency and Reasoning

GPT-5.4 is OpenAI's most token-efficient reasoning model to date, delivering faster inference and lower token usage compared to GPT-5.2 while maintaining or exceeding performance. In ChatGPT, the GPT-5.4 Thinking variant now provides upfront thinking plans that allow users to adjust course mid-response, enabling convergence to desired outputs without additional turns. The model also improves deep web research for highly specific queries while maintaining context across longer thinking horizons.

Benchmarks and Real-World Performance

Beyond GDPval, GPT-5.4 shows significant gains across multiple benchmarks:

SWE-Bench Pro (Public): 57.7% (vs. 55.6% for GPT-5.2)
Toolathlon: 54.6% (vs. 46.3% for GPT-5.2)
BrowseComp: 82.7% (vs. 65.8% for GPT-5.2)

Early adopters including Mercor (APEX-Agents leaderboard leader) and Harvey (91% on BigLaw Bench) report superior performance on specialized professional services tasks, faster execution, and lower operational costs than competing frontier models.

Availability

GPT-5.4 is available now to ChatGPT users and API customers. The base GPT-5.4 model and premium GPT-5.4 Pro tier are ready for integration into agentic workflows and professional applications requiring advanced reasoning, coding, and document automation capabilities.

GPT-5.4: A New Standard for Professional AI

Knowledge Work and Accuracy

Computer Use and Agentic Capabilities

Efficiency and Reasoning

Benchmarks and Real-World Performance

Availability

Products

Tags

Published

Source

Related News