Mar 06, 2026

GPT

OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search

OpenAI releases GPT-5.4 with native computer control, a 1-million-token context window, and a new Tool Search system that cuts token usage by 47%.

#OpenAI#GPT-5.4#Computer Use#Tool Search#Context Window

OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search

AI Summary

OpenAI releases GPT-5.4 with native computer control, a 1-million-token context window, and a new Tool Search system that cuts token usage by 47%.

OpenAI's Most Capable Model Yet

On March 5, 2026, OpenAI released GPT-5.4, a new foundation model the company describes as its most capable and efficient frontier model for professional work. Available in three variants, standard GPT-5.4, GPT-5.4 Thinking (a reasoning-focused version), and GPT-5.4 Pro (optimized for maximum performance), the release consolidates capabilities that were previously spread across separate models into a single unified system.

GPT-5.4 is rolling out to ChatGPT Plus, Team, and Pro subscribers, as well as through the OpenAI API. The model represents a significant step forward in three key areas: native computer use, an industry-leading context window, and a novel approach to tool management that dramatically reduces costs.

Native Computer Use: A First for OpenAI

GPT-5.4 is the first general-purpose OpenAI model that can take direct control of a computer. The model can click, type, and navigate software applications using screenshots and mouse/keyboard commands, without relying on a separate specialized model.

This capability positions GPT-5.4 as a direct competitor to Anthropic's Claude computer use feature, which launched in late 2024. The difference is that GPT-5.4 integrates computer control natively into the same model that handles conversation, coding, and reasoning, rather than requiring a separate tool or model.

On the OSWorld-Verified benchmark, which measures real-world computer use tasks, GPT-5.4 scores 75.0%. This not only exceeds GPT-5.2's score of 47.3% but also surpasses the measured human baseline of 72.4%. On WebArena Verified, another computer use benchmark, GPT-5.4 also sets a new record.

One Million Token Context Window

The API version of GPT-5.4 supports context windows of up to one million tokens, the largest context window ever offered by OpenAI. This is a substantial increase from the 128,000-token limit of GPT-4 and positions the model for enterprise workflows that require processing large codebases, lengthy legal documents, or extensive research corpora.

The expanded context window is particularly significant for agentic applications, where models need to plan, execute, and verify tasks across long horizons while maintaining coherent state across many interactions.

Tool Search: A New Approach to Efficiency

Perhaps the most technically innovative feature of GPT-5.4 is Tool Search, a new system for managing tool calling that rethinks how models interact with APIs and external services.

Traditionally, all tool definitions are included in every API request, consuming significant tokens even when most tools are not needed. With Tool Search, GPT-5.4 receives only a lightweight list of available tools along with a search capability. When the model needs to use a specific tool, it dynamically looks up that tool's full definition and appends it to the conversation on demand.

The results are substantial. In testing on 250 tasks from Scale's MCP Atlas benchmark with 36 MCP servers enabled, the Tool Search configuration reduced total token usage by 47% while maintaining accuracy. For developers building complex agentic systems with many tool integrations, this translates directly into lower API costs and faster response times.

Benchmark Performance

Beyond computer use, GPT-5.4 delivers broad improvements across professional benchmarks:

Benchmark	GPT-5.4	GPT-5.2	Improvement
OSWorld-Verified	75.0%	47.3%	+27.7 points
GDPval (Knowledge Work)	83.0%	N/A	Record score
Claim Accuracy	+33%	Baseline	Per-claim error reduction
Response Accuracy	+18%	Baseline	Overall error reduction

The model consolidates the coding strengths of GPT-5.3-Codex, improved reasoning from GPT-5.4 Thinking, and the new agentic capabilities for autonomous desktop, browser, and application navigation.

Three Model Variants

OpenAI is offering GPT-5.4 in three configurations to serve different use cases:

GPT-5.4 (Standard): The default model for ChatGPT subscribers, balancing capability with speed for everyday tasks including conversation, coding, analysis, and now computer use.

GPT-5.4 Thinking: A reasoning-focused variant that applies extended chain-of-thought processing to complex problems. Designed for tasks requiring multi-step logic, mathematical proofs, or scientific reasoning.

GPT-5.4 Pro: Optimized for maximum performance on the most demanding professional tasks. Available for users who need the highest accuracy on complex enterprise workflows.

Pros

Native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, making autonomous software navigation practically viable
Tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for developers building agentic systems
One-million-token context window enables processing of entire codebases, legal documents, and research corpora in a single request
Consolidates coding, reasoning, and agentic capabilities into one model instead of requiring separate specialized models
33% reduction in per-claim errors compared to GPT-5.2 demonstrates meaningful progress on hallucination reduction

Cons

Computer use capabilities remain in early stages, and real-world reliability across diverse software environments is unproven at scale
The one-million-token context window is API-only, with ChatGPT subscribers likely receiving a smaller limit
Three model variants (Standard, Thinking, Pro) add complexity for users deciding which version to use
Pricing details for the Pro variant and extended context windows have not been fully disclosed

Outlook

GPT-5.4 represents OpenAI's clearest statement yet that the future of AI is agentic. By combining computer use, massive context, and efficient tool management in a single model, OpenAI is building the foundation for AI systems that can autonomously complete complex multi-step workflows.

The Tool Search innovation is particularly worth watching. As the AI ecosystem moves toward standardized tool protocols like MCP, the ability to efficiently manage hundreds or thousands of tool definitions becomes a critical infrastructure challenge. GPT-5.4's approach of dynamic tool retrieval could become the standard pattern.

The competitive landscape is intensifying. Anthropic's Claude already offers computer use capabilities, and Google's Gemini is pushing agentic features through Pixel devices. GPT-5.4's benchmark-leading performance on computer use tasks gives OpenAI a strong position, but the real test will be reliability in production deployments.

Conclusion

GPT-5.4 is a significant release that advances the state of the art in three important dimensions: autonomous computer control, context length, and tool efficiency. The model's ability to exceed human performance on computer use benchmarks while simultaneously reducing operational costs through Tool Search makes it compelling for both individual developers and enterprise customers. For teams building agentic AI applications, GPT-5.4 is the most complete single-model solution currently available from any major provider.

Editor's Verdict

OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search stands out as one of the more compelling gpt developments we've covered recently.

The strongest case for paying attention is native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, enabling practical autonomous software navigation, which raises the bar for what readers should now expect from peers in this space. Reinforcing that, tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for agentic applications adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: GPT-5.4 is the first OpenAI model with native computer use, scoring 75.0% on OSWorld-Verified and surpassing the 72.4% human baseline. On the other side of the ledger, computer use reliability across diverse real-world software environments remains unproven at scale is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, 1-million-token context window is API-only, not available to all ChatGPT subscribers narrows the set of teams for whom this is an obvious yes.

For ChatGPT power users, OpenAI API customers, and enterprise teams already running on the OpenAI stack, the answer here is to pilot now and plan for production use. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.

Pros

Native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, enabling practical autonomous software navigation
Tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for agentic applications
1-million-token context window enables processing entire codebases and document corpora in a single request
Consolidates coding, reasoning, and agentic capabilities into one unified model
33% reduction in per-claim errors compared to GPT-5.2 demonstrates meaningful hallucination reduction

Cons

Computer use reliability across diverse real-world software environments remains unproven at scale
1-million-token context window is API-only, not available to all ChatGPT subscribers
Three model variants add decision complexity for users choosing between Standard, Thinking, and Pro
Full pricing details for Pro variant and extended context have not been disclosed

References

OpenAI launches GPT-5.4 with Pro and Thinking versions - TechCrunch Introducing GPT-5.4 - OpenAI OpenAI launches GPT-5.4, its most powerful model for enterprise work - Fortune OpenAI GPT-5.4 launches with native computer-use and 1M tokens - Interesting Engineering GPT-5.4 is here and OpenAI just made every other AI model look slow - Tom's Guide

Comments0

Key Features

OpenAI launched GPT-5.4 on March 5, 2026, introducing native computer use that scores 75.0% on OSWorld-Verified (surpassing the 72.4% human baseline), a 1-million-token context window (the largest in OpenAI's history), and Tool Search which reduces token usage by 47% in multi-tool scenarios. The model is available in three variants: Standard, Thinking (reasoning-focused), and Pro (maximum performance). GPT-5.4 consolidates coding capabilities from GPT-5.3-Codex with improved reasoning and agentic desktop navigation, achieving 33% fewer per-claim errors than GPT-5.2.

Key Insights

GPT-5.4 is the first OpenAI model with native computer use, scoring 75.0% on OSWorld-Verified and surpassing the 72.4% human baseline
Tool Search dynamically retrieves tool definitions on demand, reducing token usage by 47% across 250 tasks on Scale's MCP Atlas benchmark
The 1-million-token API context window is the largest OpenAI has ever offered, positioning GPT-5.4 for enterprise-scale document processing
GPT-5.4 consolidates coding, reasoning, and agentic capabilities that were previously split across GPT-5.3-Codex and other specialized models
Per-claim error rates dropped 33% compared to GPT-5.2, with overall response errors down 18%
Three model variants (Standard, Thinking, Pro) allow users to optimize for speed, reasoning depth, or maximum accuracy
The model scored a record 83% on GDPval, OpenAI's benchmark for knowledge work tasks
Computer use integration positions GPT-5.4 as a direct competitor to Anthropic Claude's computer use feature

Was this review helpful?

Twitter/X

Related AI Reviews

Codex Micro Review: OpenAI's First Hardware, a $230 Keypad

NEWGPT

159

Visit Official Site

🟠Anthropic Claude 💎Google Gemini 🤖OpenAI GPT

OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search

OpenAI's Most Capable Model Yet

Native Computer Use: A First for OpenAI

One Million Token Context Window

Tool Search: A New Approach to Efficiency

Benchmark Performance

Three Model Variants

Pros

Cons

Outlook

Conclusion

Editor's Verdict

Pros

Cons

References

Comments0

Key Features

Key Insights

Was this review helpful?

Share

Related AI Reviews

Codex Micro Review: OpenAI's First Hardware, a $230 Keypad

GPT-Live-1 Review: OpenAI's Full-Duplex Voice Model for ChatGPT

GPT-5.6 Goes General Availability: Sol, Terra, Luna and ChatGPT Work Launch

White House Asks OpenAI to Slow-Roll GPT-5.6 Over Cybersecurity Concerns