OpenAI releases GPT-5.4 with native computer-use capabilities and 57.7% score on code generation benchmarks

Key Capabilities

OpenAI's GPT-5.4 combines advances in reasoning, coding, and agentic workflows into a single frontier model. The release includes:

Native computer-use capabilities: GPT-5.4 is the first general-purpose model from OpenAI with built-in computer-use features, enabling agents to operate computers and execute complex workflows across applications via mouse/keyboard commands and screenshot interpretation.
Improved knowledge work: Achieves 83% win rate on GDPval benchmark (44 professional occupations) and 87.3% on spreadsheet modeling tasks—a significant jump from GPT-5.2's 68.4%.
Better reasoning transparency: In ChatGPT, GPT-5.4 Thinking displays its thinking upfront, allowing users to adjust course mid-response before final output generation.
Token efficiency: GPT-5.4 is the most token-efficient reasoning model yet, reducing token usage and improving speed compared to GPT-5.2.

Specific Improvements

For Developers and API Users: GPT-5.4 supports up to 1M tokens of context for complex multi-step tasks. It features native state-of-the-art computer-use capabilities with 75% success rate on OSWorld-Verified benchmarks (exceeding human performance at 72.4%) and new "tool search" functionality to help agents find and use the right tools more efficiently.

For Knowledge Work: Human raters preferred GPT-5.4 presentations 68% of the time over GPT-5.2 due to superior aesthetics and visual design. The model is 33% less likely to make false claims and 18% less likely to contain errors in responses.

For Code Generation: On SWE-Bench Pro, GPT-5.4 achieves 57.7%, improving over GPT-5.3-Codex at 56.8%.

Availability and Tools

GPT-5.4 is available now in ChatGPT (as GPT-5.4 Thinking and GPT-5.4 Pro), the OpenAI API, and Codex. Enterprise customers can use the newly released ChatGPT for Excel add-in. Updated spreadsheet and presentation skills are available in Codex and the API.

Industry Recognition

Leading companies report strong results: Mercor notes GPT-5.4 now tops their APEX-Agents benchmark for professional services, while Harvey reports 91% accuracy on BigLaw Bench for legal document analysis.

Key Capabilities

Specific Improvements

Availability and Tools

Industry Recognition

Products

Tags

Published

Source

Related News