OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search
OpenAI releases GPT-5.4 with native computer control, a 1-million-token context window, and a new Tool Search system that cuts token usage by 47%.
OpenAI releases GPT-5.4 with native computer control, a 1-million-token context window, and a new Tool Search system that cuts token usage by 47%.
OpenAI's Most Capable Model Yet
On March 5, 2026, OpenAI released GPT-5.4, a new foundation model the company describes as its most capable and efficient frontier model for professional work. Available in three variants, standard GPT-5.4, GPT-5.4 Thinking (a reasoning-focused version), and GPT-5.4 Pro (optimized for maximum performance), the release consolidates capabilities that were previously spread across separate models into a single unified system.
GPT-5.4 is rolling out to ChatGPT Plus, Team, and Pro subscribers, as well as through the OpenAI API. The model represents a significant step forward in three key areas: native computer use, an industry-leading context window, and a novel approach to tool management that dramatically reduces costs.
Native Computer Use: A First for OpenAI
GPT-5.4 is the first general-purpose OpenAI model that can take direct control of a computer. The model can click, type, and navigate software applications using screenshots and mouse/keyboard commands, without relying on a separate specialized model.
This capability positions GPT-5.4 as a direct competitor to Anthropic's Claude computer use feature, which launched in late 2024. The difference is that GPT-5.4 integrates computer control natively into the same model that handles conversation, coding, and reasoning, rather than requiring a separate tool or model.
On the OSWorld-Verified benchmark, which measures real-world computer use tasks, GPT-5.4 scores 75.0%. This not only exceeds GPT-5.2's score of 47.3% but also surpasses the measured human baseline of 72.4%. On WebArena Verified, another computer use benchmark, GPT-5.4 also sets a new record.
One Million Token Context Window
The API version of GPT-5.4 supports context windows of up to one million tokens, the largest context window ever offered by OpenAI. This is a substantial increase from the 128,000-token limit of GPT-4 and positions the model for enterprise workflows that require processing large codebases, lengthy legal documents, or extensive research corpora.
The expanded context window is particularly significant for agentic applications, where models need to plan, execute, and verify tasks across long horizons while maintaining coherent state across many interactions.
Tool Search: A New Approach to Efficiency
Perhaps the most technically innovative feature of GPT-5.4 is Tool Search, a new system for managing tool calling that rethinks how models interact with APIs and external services.
Traditionally, all tool definitions are included in every API request, consuming significant tokens even when most tools are not needed. With Tool Search, GPT-5.4 receives only a lightweight list of available tools along with a search capability. When the model needs to use a specific tool, it dynamically looks up that tool's full definition and appends it to the conversation on demand.
The results are substantial. In testing on 250 tasks from Scale's MCP Atlas benchmark with 36 MCP servers enabled, the Tool Search configuration reduced total token usage by 47% while maintaining accuracy. For developers building complex agentic systems with many tool integrations, this translates directly into lower API costs and faster response times.
Benchmark Performance
Beyond computer use, GPT-5.4 delivers broad improvements across professional benchmarks:
| Benchmark | GPT-5.4 | GPT-5.2 | Improvement |
|---|---|---|---|
| OSWorld-Verified | 75.0% | 47.3% | +27.7 points |
| GDPval (Knowledge Work) | 83.0% | N/A | Record score |
| Claim Accuracy | +33% | Baseline | Per-claim error reduction |
| Response Accuracy | +18% | Baseline | Overall error reduction |
The model consolidates the coding strengths of GPT-5.3-Codex, improved reasoning from GPT-5.4 Thinking, and the new agentic capabilities for autonomous desktop, browser, and application navigation.
Three Model Variants
OpenAI is offering GPT-5.4 in three configurations to serve different use cases:
GPT-5.4 (Standard): The default model for ChatGPT subscribers, balancing capability with speed for everyday tasks including conversation, coding, analysis, and now computer use.
GPT-5.4 Thinking: A reasoning-focused variant that applies extended chain-of-thought processing to complex problems. Designed for tasks requiring multi-step logic, mathematical proofs, or scientific reasoning.
GPT-5.4 Pro: Optimized for maximum performance on the most demanding professional tasks. Available for users who need the highest accuracy on complex enterprise workflows.
Pros
- Native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, making autonomous software navigation practically viable
- Tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for developers building agentic systems
- One-million-token context window enables processing of entire codebases, legal documents, and research corpora in a single request
- Consolidates coding, reasoning, and agentic capabilities into one model instead of requiring separate specialized models
- 33% reduction in per-claim errors compared to GPT-5.2 demonstrates meaningful progress on hallucination reduction
Cons
- Computer use capabilities remain in early stages, and real-world reliability across diverse software environments is unproven at scale
- The one-million-token context window is API-only, with ChatGPT subscribers likely receiving a smaller limit
- Three model variants (Standard, Thinking, Pro) add complexity for users deciding which version to use
- Pricing details for the Pro variant and extended context windows have not been fully disclosed
Outlook
GPT-5.4 represents OpenAI's clearest statement yet that the future of AI is agentic. By combining computer use, massive context, and efficient tool management in a single model, OpenAI is building the foundation for AI systems that can autonomously complete complex multi-step workflows.
The Tool Search innovation is particularly worth watching. As the AI ecosystem moves toward standardized tool protocols like MCP, the ability to efficiently manage hundreds or thousands of tool definitions becomes a critical infrastructure challenge. GPT-5.4's approach of dynamic tool retrieval could become the standard pattern.
The competitive landscape is intensifying. Anthropic's Claude already offers computer use capabilities, and Google's Gemini is pushing agentic features through Pixel devices. GPT-5.4's benchmark-leading performance on computer use tasks gives OpenAI a strong position, but the real test will be reliability in production deployments.
Conclusion
GPT-5.4 is a significant release that advances the state of the art in three important dimensions: autonomous computer control, context length, and tool efficiency. The model's ability to exceed human performance on computer use benchmarks while simultaneously reducing operational costs through Tool Search makes it compelling for both individual developers and enterprise customers. For teams building agentic AI applications, GPT-5.4 is the most complete single-model solution currently available from any major provider.
Editor's Verdict
OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search stands out as one of the more compelling gpt developments we've covered recently.
The strongest case for paying attention is native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, enabling practical autonomous software navigation, which raises the bar for what readers should now expect from peers in this space. Reinforcing that, tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for agentic applications adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: GPT-5.4 is the first OpenAI model with native computer use, scoring 75.0% on OSWorld-Verified and surpassing the 72.4% human baseline. On the other side of the ledger, computer use reliability across diverse real-world software environments remains unproven at scale is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, 1-million-token context window is API-only, not available to all ChatGPT subscribers narrows the set of teams for whom this is an obvious yes.
For ChatGPT power users, OpenAI API customers, and enterprise teams already running on the OpenAI stack, the answer here is to pilot now and plan for production use. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.
Pros
- Native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, enabling practical autonomous software navigation
- Tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for agentic applications
- 1-million-token context window enables processing entire codebases and document corpora in a single request
- Consolidates coding, reasoning, and agentic capabilities into one unified model
- 33% reduction in per-claim errors compared to GPT-5.2 demonstrates meaningful hallucination reduction
Cons
- Computer use reliability across diverse real-world software environments remains unproven at scale
- 1-million-token context window is API-only, not available to all ChatGPT subscribers
- Three model variants add decision complexity for users choosing between Standard, Thinking, and Pro
- Full pricing details for Pro variant and extended context have not been disclosed
References
Comments0
Key Features
OpenAI launched GPT-5.4 on March 5, 2026, introducing native computer use that scores 75.0% on OSWorld-Verified (surpassing the 72.4% human baseline), a 1-million-token context window (the largest in OpenAI's history), and Tool Search which reduces token usage by 47% in multi-tool scenarios. The model is available in three variants: Standard, Thinking (reasoning-focused), and Pro (maximum performance). GPT-5.4 consolidates coding capabilities from GPT-5.3-Codex with improved reasoning and agentic desktop navigation, achieving 33% fewer per-claim errors than GPT-5.2.
Key Insights
- GPT-5.4 is the first OpenAI model with native computer use, scoring 75.0% on OSWorld-Verified and surpassing the 72.4% human baseline
- Tool Search dynamically retrieves tool definitions on demand, reducing token usage by 47% across 250 tasks on Scale's MCP Atlas benchmark
- The 1-million-token API context window is the largest OpenAI has ever offered, positioning GPT-5.4 for enterprise-scale document processing
- GPT-5.4 consolidates coding, reasoning, and agentic capabilities that were previously split across GPT-5.3-Codex and other specialized models
- Per-claim error rates dropped 33% compared to GPT-5.2, with overall response errors down 18%
- Three model variants (Standard, Thinking, Pro) allow users to optimize for speed, reasoning depth, or maximum accuracy
- The model scored a record 83% on GDPval, OpenAI's benchmark for knowledge work tasks
- Computer use integration positions GPT-5.4 as a direct competitor to Anthropic Claude's computer use feature
Was this review helpful?
Share
Related AI Reviews
OpenAI Publishes Frontier Governance Framework: EU AI Act and California Compliance Mapped
OpenAI released a public governance document on May 28, 2026 mapping its internal safety practices to the EU AI Act and California's Transparency in Frontier AI Act, covering cyber offense, CBRN, manipulation, and loss-of-control risks.
OpenAI Adopts C2PA and SynthID to Combat AI-Generated Misinformation
OpenAI joined the C2PA steering committee and embedded Google's SynthID watermarking in all AI images, creating a two-layer provenance standard for detecting synthetic media.
OpenAI Files Confidential S-1 with SEC: $1 Trillion IPO Targets September 2026
OpenAI filed a confidential S-1 with the SEC on May 22, 2026, targeting a $1 trillion IPO as early as September, led by Goldman Sachs and Morgan Stanley.
OpenAI and Dell Partner to Deploy Codex in Hybrid and On-Premises Enterprise Environments
OpenAI and Dell Technologies announced on May 19, 2026 a partnership to bring Codex to hybrid and on-premises infrastructure via the Dell AI Factory, targeting the 5,000+ enterprises with existing Dell deployments.
