GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench
OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.
OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.
What Is GPT-5.5?
OpenAI released GPT-5.5 on April 23, 2026, rolling it out to Plus, Pro, Business, and Enterprise subscribers across ChatGPT and Codex — just six weeks after GPT-5.4. The compressed release cadence signals how fiercely the frontier AI race is accelerating.
GPT-5.5 is the first fully retrained base model since GPT-4.5, not an incremental fine-tune. OpenAI co-founder Greg Brockman called it "a new class of intelligence" and "a faster, sharper thinker for fewer tokens compared to 5.4."
Key Benchmark Results
GPT-5.5 posts category-leading numbers across agentic and knowledge-work benchmarks:
| Benchmark | GPT-5.5 | Claude Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|
| Terminal-Bench 2.0 | 82.7% | 69.4% | 68.5% |
| GDPval (knowledge work) | 84.9% | — | — |
| SWE-Bench Pro | 58.6% | 64.3% | — |
| OSWorld-Verified | 78.7% | — | — |
| BrowseComp (Pro) | 90.1% | — | — |
The 82.7% Terminal-Bench 2.0 result — testing complex command-line workflows — is particularly notable, surpassing Claude Opus 4.7 by more than 13 percentage points. GDPval, which benchmarks AI against professionals across 44 knowledge-work occupations, places GPT-5.5 at 84.9%, a strong indicator for enterprise use cases.
On SWE-Bench Pro (end-to-end GitHub issue resolution), Claude Opus 4.7 leads at 64.3% vs. GPT-5.5's 58.6%, though OpenAI has flagged potential memorization concerns in competitor testing methodologies.
Agentic Capabilities
The defining characteristic of GPT-5.5 is its ability to handle extended autonomous workflows with minimal human intervention. The model writes and debugs code, browses the web, fills out spreadsheets, and completes multi-step tasks without requiring a human supervisor at each step.
OpenAI Chief Research Officer Mark Chen highlighted that gains are "especially strong in agentic coding, computer use, knowledge work, and early scientific research — areas where progress depends on reasoning across context and taking action over time." Early testers have documented the model integrating real-time data feeds to construct and execute mock analytical strategies.
OSWorld-Verified at 78.7% measures the model's ability to autonomously navigate and operate within a computer environment — directly relevant to enterprise automation use cases.
Toward the OpenAI Super App
OpenAI framed the GPT-5.5 launch as one step toward a unified service combining ChatGPT, Codex, and an AI browser into a single enterprise tool. The super app vision aligns with enterprise demand for an AI layer that spans research, coding, document production, and autonomous execution within a single subscription.
Bank of New York's CIO noted that GPT-5.5 delivers "meaningful improvements in hallucination resistance" — critical for financial institutions where accuracy errors carry regulatory consequences.
Performance Without Compromise
A notable engineering achievement is that GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while performing at a higher intelligence level. Larger models are typically slower to serve, so maintaining speed parity while improving capability is a meaningful operational advantage for enterprise deployments.
The model also claims improved token efficiency. OpenAI argues that while per-token pricing increased, the model completes tasks in fewer tokens, offsetting costs for most workloads.
Pricing
API pricing for GPT-5.5 is:
- Standard: $5 per million input tokens, $30 per million output tokens
- Pro: $30 per million input tokens, $180 per million output tokens
This represents a price increase over GPT-5.4 ($2.50/$15 standard), though OpenAI maintains token efficiency improvements make total-cost-of-task comparable or lower for complex jobs.
Availability
GPT-5.5 is available immediately to ChatGPT Plus, Pro, Business, and Enterprise subscribers. API access is rolling out imminently. Free-tier access is not yet announced.
Pros and Cons Analysis
GPT-5.5's terminal and computer-use benchmark leadership makes it the strongest model for autonomous agentic workflows as of its launch date. However, it trails Claude Opus 4.7 on pure coding tasks measured by SWE-Bench Pro, and the significant API price increase may be a barrier for cost-sensitive teams.
Outlook
The six-week gap between GPT-5.4 and GPT-5.5 confirms that frontier labs are no longer following quarterly release schedules. For enterprise buyers, GPT-5.5 represents the most capable autonomous agent platform currently available, particularly for organizations running knowledge-work workflows, code generation at scale, or scientific research pipelines. OpenAI's super app roadmap suggests that model releases will increasingly come bundled with deeper platform integrations, making the underlying model version less relevant than the full product experience.
Pros
- Category-leading performance on agentic and terminal benchmarks (Terminal-Bench 2.0, OSWorld, BrowseComp)
- Matches GPT-5.4 latency despite higher intelligence — no speed trade-off
- Strong enterprise validation with improved hallucination resistance noted by financial-sector users
- Available immediately to all paid ChatGPT subscribers with API access rolling out
Cons
- API pricing increased significantly over GPT-5.4 — $5/M input vs $2.50/M previously
- Trails Claude Opus 4.7 on SWE-Bench Pro coding benchmarks (58.6% vs 64.3%)
- Free-tier access not yet announced, limiting access for non-paying users
- Rapid release cadence may create integration stability concerns for enterprise teams
References
Comments0
Key Features
1. First fully retrained base model since GPT-4.5 — not a fine-tune 2. 82.7% on Terminal-Bench 2.0, leading Claude Opus 4.7 (69.4%) and Gemini 3.1 Pro (68.5%) 3. 84.9% on GDPval across 44 professional knowledge-work occupations 4. 78.7% on OSWorld-Verified for autonomous computer environment operation 5. 90.1% on BrowseComp (Pro variant) for web research 6. Matches GPT-5.4 per-token latency while delivering higher intelligence 7. Released six weeks after GPT-5.4, signaling accelerating release cadence
Key Insights
- GPT-5.5 is the first model since GPT-4.5 to be built from scratch on a new base, rather than fine-tuned from an existing checkpoint
- The 82.7% Terminal-Bench 2.0 score is the highest ever recorded by any model on that benchmark as of April 2026
- OpenAI is framing GPT-5.5 as infrastructure for an AI super app combining ChatGPT, Codex, and an AI browser
- Six-week release cadence between 5.4 and 5.5 suggests OpenAI has accelerated its internal deployment pipeline significantly
- The per-token price increase (from $2.50 to $5 for input) may be offset by the model's improved token efficiency on complex tasks
- GPT-5.5 trails Claude Opus 4.7 on SWE-Bench Pro (58.6% vs 64.3%), suggesting each model still has category-specific strengths
- BrowseComp at 90.1% for the Pro variant makes GPT-5.5 particularly powerful for research-intensive enterprise workflows
- Bank of New York's endorsement of hallucination resistance signals growing financial-sector confidence in agentic AI
Was this review helpful?
Share
Related AI Reviews
OpenAI Launches Workspace Agents: ChatGPT Becomes a Full Team Automation Platform
OpenAI's new Workspace Agents transform ChatGPT from a conversational tool into an autonomous team automation platform, running Codex-powered agents in the cloud even when you're offline.
OpenAI Codex Goes Beyond Code: Full Mac Computer Use, Memory, and 90+ Plugins
OpenAI's April 2026 Codex update turns the coding assistant into a full desktop AI agent for macOS, adding computer use, memory, image generation, and over 90 new plugins.
OpenAI Launches GPT-Rosalind: A Specialized AI Model for Drug Discovery and Life Sciences Research
OpenAI released GPT-Rosalind on April 17, 2026, a domain-specific model for biology, genomics, and drug discovery, with access limited to vetted enterprise research partners including Amgen, Moderna, and Thermo Fisher.
OpenAI GPT-5.4-Cyber Review: A Purpose-Built AI Model for Defensive Cybersecurity
OpenAI launches GPT-5.4-Cyber, a fine-tuned GPT-5.4 variant for defensive security work, with binary reverse engineering and expanded Trusted Access for Cyber program.
