Back to list
Mar 06, 2026
3
0
0
GPTNEW

OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search

OpenAI releases GPT-5.4 with native computer control, a 1-million-token context window, and a new Tool Search system that cuts token usage by 47%.

#OpenAI#GPT-5.4#Computer Use#Tool Search#Context Window
OpenAI Launches GPT-5.4: Computer Use, 1M Token Context, and Tool Search
AI Summary

OpenAI releases GPT-5.4 with native computer control, a 1-million-token context window, and a new Tool Search system that cuts token usage by 47%.

OpenAI's Most Capable Model Yet

On March 5, 2026, OpenAI released GPT-5.4, a new foundation model the company describes as its most capable and efficient frontier model for professional work. Available in three variants, standard GPT-5.4, GPT-5.4 Thinking (a reasoning-focused version), and GPT-5.4 Pro (optimized for maximum performance), the release consolidates capabilities that were previously spread across separate models into a single unified system.

GPT-5.4 is rolling out to ChatGPT Plus, Team, and Pro subscribers, as well as through the OpenAI API. The model represents a significant step forward in three key areas: native computer use, an industry-leading context window, and a novel approach to tool management that dramatically reduces costs.

Native Computer Use: A First for OpenAI

GPT-5.4 is the first general-purpose OpenAI model that can take direct control of a computer. The model can click, type, and navigate software applications using screenshots and mouse/keyboard commands, without relying on a separate specialized model.

This capability positions GPT-5.4 as a direct competitor to Anthropic's Claude computer use feature, which launched in late 2024. The difference is that GPT-5.4 integrates computer control natively into the same model that handles conversation, coding, and reasoning, rather than requiring a separate tool or model.

On the OSWorld-Verified benchmark, which measures real-world computer use tasks, GPT-5.4 scores 75.0%. This not only exceeds GPT-5.2's score of 47.3% but also surpasses the measured human baseline of 72.4%. On WebArena Verified, another computer use benchmark, GPT-5.4 also sets a new record.

One Million Token Context Window

The API version of GPT-5.4 supports context windows of up to one million tokens, the largest context window ever offered by OpenAI. This is a substantial increase from the 128,000-token limit of GPT-4 and positions the model for enterprise workflows that require processing large codebases, lengthy legal documents, or extensive research corpora.

The expanded context window is particularly significant for agentic applications, where models need to plan, execute, and verify tasks across long horizons while maintaining coherent state across many interactions.

Tool Search: A New Approach to Efficiency

Perhaps the most technically innovative feature of GPT-5.4 is Tool Search, a new system for managing tool calling that rethinks how models interact with APIs and external services.

Traditionally, all tool definitions are included in every API request, consuming significant tokens even when most tools are not needed. With Tool Search, GPT-5.4 receives only a lightweight list of available tools along with a search capability. When the model needs to use a specific tool, it dynamically looks up that tool's full definition and appends it to the conversation on demand.

The results are substantial. In testing on 250 tasks from Scale's MCP Atlas benchmark with 36 MCP servers enabled, the Tool Search configuration reduced total token usage by 47% while maintaining accuracy. For developers building complex agentic systems with many tool integrations, this translates directly into lower API costs and faster response times.

Benchmark Performance

Beyond computer use, GPT-5.4 delivers broad improvements across professional benchmarks:

BenchmarkGPT-5.4GPT-5.2Improvement
OSWorld-Verified75.0%47.3%+27.7 points
GDPval (Knowledge Work)83.0%N/ARecord score
Claim Accuracy+33%BaselinePer-claim error reduction
Response Accuracy+18%BaselineOverall error reduction

The model consolidates the coding strengths of GPT-5.3-Codex, improved reasoning from GPT-5.4 Thinking, and the new agentic capabilities for autonomous desktop, browser, and application navigation.

Three Model Variants

OpenAI is offering GPT-5.4 in three configurations to serve different use cases:

GPT-5.4 (Standard): The default model for ChatGPT subscribers, balancing capability with speed for everyday tasks including conversation, coding, analysis, and now computer use.

GPT-5.4 Thinking: A reasoning-focused variant that applies extended chain-of-thought processing to complex problems. Designed for tasks requiring multi-step logic, mathematical proofs, or scientific reasoning.

GPT-5.4 Pro: Optimized for maximum performance on the most demanding professional tasks. Available for users who need the highest accuracy on complex enterprise workflows.

Pros

  • Native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, making autonomous software navigation practically viable
  • Tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for developers building agentic systems
  • One-million-token context window enables processing of entire codebases, legal documents, and research corpora in a single request
  • Consolidates coding, reasoning, and agentic capabilities into one model instead of requiring separate specialized models
  • 33% reduction in per-claim errors compared to GPT-5.2 demonstrates meaningful progress on hallucination reduction

Cons

  • Computer use capabilities remain in early stages, and real-world reliability across diverse software environments is unproven at scale
  • The one-million-token context window is API-only, with ChatGPT subscribers likely receiving a smaller limit
  • Three model variants (Standard, Thinking, Pro) add complexity for users deciding which version to use
  • Pricing details for the Pro variant and extended context windows have not been fully disclosed

Outlook

GPT-5.4 represents OpenAI's clearest statement yet that the future of AI is agentic. By combining computer use, massive context, and efficient tool management in a single model, OpenAI is building the foundation for AI systems that can autonomously complete complex multi-step workflows.

The Tool Search innovation is particularly worth watching. As the AI ecosystem moves toward standardized tool protocols like MCP, the ability to efficiently manage hundreds or thousands of tool definitions becomes a critical infrastructure challenge. GPT-5.4's approach of dynamic tool retrieval could become the standard pattern.

The competitive landscape is intensifying. Anthropic's Claude already offers computer use capabilities, and Google's Gemini is pushing agentic features through Pixel devices. GPT-5.4's benchmark-leading performance on computer use tasks gives OpenAI a strong position, but the real test will be reliability in production deployments.

Conclusion

GPT-5.4 is a significant release that advances the state of the art in three important dimensions: autonomous computer control, context length, and tool efficiency. The model's ability to exceed human performance on computer use benchmarks while simultaneously reducing operational costs through Tool Search makes it compelling for both individual developers and enterprise customers. For teams building agentic AI applications, GPT-5.4 is the most complete single-model solution currently available from any major provider.

Pros

  • Native computer use surpasses the human baseline on OSWorld-Verified at 75.0%, enabling practical autonomous software navigation
  • Tool Search reduces token usage by 47% in multi-tool scenarios, directly lowering API costs for agentic applications
  • 1-million-token context window enables processing entire codebases and document corpora in a single request
  • Consolidates coding, reasoning, and agentic capabilities into one unified model
  • 33% reduction in per-claim errors compared to GPT-5.2 demonstrates meaningful hallucination reduction

Cons

  • Computer use reliability across diverse real-world software environments remains unproven at scale
  • 1-million-token context window is API-only, not available to all ChatGPT subscribers
  • Three model variants add decision complexity for users choosing between Standard, Thinking, and Pro
  • Full pricing details for Pro variant and extended context have not been disclosed

Comments0

Key Features

OpenAI launched GPT-5.4 on March 5, 2026, introducing native computer use that scores 75.0% on OSWorld-Verified (surpassing the 72.4% human baseline), a 1-million-token context window (the largest in OpenAI's history), and Tool Search which reduces token usage by 47% in multi-tool scenarios. The model is available in three variants: Standard, Thinking (reasoning-focused), and Pro (maximum performance). GPT-5.4 consolidates coding capabilities from GPT-5.3-Codex with improved reasoning and agentic desktop navigation, achieving 33% fewer per-claim errors than GPT-5.2.

Key Insights

  • GPT-5.4 is the first OpenAI model with native computer use, scoring 75.0% on OSWorld-Verified and surpassing the 72.4% human baseline
  • Tool Search dynamically retrieves tool definitions on demand, reducing token usage by 47% across 250 tasks on Scale's MCP Atlas benchmark
  • The 1-million-token API context window is the largest OpenAI has ever offered, positioning GPT-5.4 for enterprise-scale document processing
  • GPT-5.4 consolidates coding, reasoning, and agentic capabilities that were previously split across GPT-5.3-Codex and other specialized models
  • Per-claim error rates dropped 33% compared to GPT-5.2, with overall response errors down 18%
  • Three model variants (Standard, Thinking, Pro) allow users to optimize for speed, reasoning depth, or maximum accuracy
  • The model scored a record 83% on GDPval, OpenAI's benchmark for knowledge work tasks
  • Computer use integration positions GPT-5.4 as a direct competitor to Anthropic Claude's computer use feature

Was this review helpful?

Share

Twitter/X