GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench
OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.
OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.
What Is GPT-5.5?
OpenAI released GPT-5.5 on April 23, 2026, rolling it out to Plus, Pro, Business, and Enterprise subscribers across ChatGPT and Codex — just six weeks after GPT-5.4. The compressed release cadence signals how fiercely the frontier AI race is accelerating.
GPT-5.5 is the first fully retrained base model since GPT-4.5, not an incremental fine-tune. OpenAI co-founder Greg Brockman called it "a new class of intelligence" and "a faster, sharper thinker for fewer tokens compared to 5.4."
Key Benchmark Results
GPT-5.5 posts category-leading numbers across agentic and knowledge-work benchmarks:
| Benchmark | GPT-5.5 | Claude Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|
| Terminal-Bench 2.0 | 82.7% | 69.4% | 68.5% |
| GDPval (knowledge work) | 84.9% | — | — |
| SWE-Bench Pro | 58.6% | 64.3% | — |
| OSWorld-Verified | 78.7% | — | — |
| BrowseComp (Pro) | 90.1% | — | — |
The 82.7% Terminal-Bench 2.0 result — testing complex command-line workflows — is particularly notable, surpassing Claude Opus 4.7 by more than 13 percentage points. GDPval, which benchmarks AI against professionals across 44 knowledge-work occupations, places GPT-5.5 at 84.9%, a strong indicator for enterprise use cases.
On SWE-Bench Pro (end-to-end GitHub issue resolution), Claude Opus 4.7 leads at 64.3% vs. GPT-5.5's 58.6%, though OpenAI has flagged potential memorization concerns in competitor testing methodologies.
Agentic Capabilities
The defining characteristic of GPT-5.5 is its ability to handle extended autonomous workflows with minimal human intervention. The model writes and debugs code, browses the web, fills out spreadsheets, and completes multi-step tasks without requiring a human supervisor at each step.
OpenAI Chief Research Officer Mark Chen highlighted that gains are "especially strong in agentic coding, computer use, knowledge work, and early scientific research — areas where progress depends on reasoning across context and taking action over time." Early testers have documented the model integrating real-time data feeds to construct and execute mock analytical strategies.
OSWorld-Verified at 78.7% measures the model's ability to autonomously navigate and operate within a computer environment — directly relevant to enterprise automation use cases.
Toward the OpenAI Super App
OpenAI framed the GPT-5.5 launch as one step toward a unified service combining ChatGPT, Codex, and an AI browser into a single enterprise tool. The super app vision aligns with enterprise demand for an AI layer that spans research, coding, document production, and autonomous execution within a single subscription.
Bank of New York's CIO noted that GPT-5.5 delivers "meaningful improvements in hallucination resistance" — critical for financial institutions where accuracy errors carry regulatory consequences.
Performance Without Compromise
A notable engineering achievement is that GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while performing at a higher intelligence level. Larger models are typically slower to serve, so maintaining speed parity while improving capability is a meaningful operational advantage for enterprise deployments.
The model also claims improved token efficiency. OpenAI argues that while per-token pricing increased, the model completes tasks in fewer tokens, offsetting costs for most workloads.
Pricing
API pricing for GPT-5.5 is:
- Standard: $5 per million input tokens, $30 per million output tokens
- Pro: $30 per million input tokens, $180 per million output tokens
This represents a price increase over GPT-5.4 ($2.50/$15 standard), though OpenAI maintains token efficiency improvements make total-cost-of-task comparable or lower for complex jobs.
Availability
GPT-5.5 is available immediately to ChatGPT Plus, Pro, Business, and Enterprise subscribers. API access is rolling out imminently. Free-tier access is not yet announced.
Pros and Cons Analysis
GPT-5.5's terminal and computer-use benchmark leadership makes it the strongest model for autonomous agentic workflows as of its launch date. However, it trails Claude Opus 4.7 on pure coding tasks measured by SWE-Bench Pro, and the significant API price increase may be a barrier for cost-sensitive teams.
Outlook
The six-week gap between GPT-5.4 and GPT-5.5 confirms that frontier labs are no longer following quarterly release schedules. For enterprise buyers, GPT-5.5 represents the most capable autonomous agent platform currently available, particularly for organizations running knowledge-work workflows, code generation at scale, or scientific research pipelines. OpenAI's super app roadmap suggests that model releases will increasingly come bundled with deeper platform integrations, making the underlying model version less relevant than the full product experience.
Editor's Verdict
GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench earns a solid recommendation within the gpt space.
The strongest case for paying attention is category-leading performance on agentic and terminal benchmarks (Terminal-Bench 2.0, OSWorld, BrowseComp), which raises the bar for what readers should now expect from peers in this space. Reinforcing that, matches GPT-5.4 latency despite higher intelligence — no speed trade-off adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: GPT-5.5 is the first model since GPT-4.5 to be built from scratch on a new base, rather than fine-tuned from an existing checkpoint. On the other side of the ledger, API pricing increased significantly over GPT-5.4 — $5/M input vs $2.50/M previously is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, trails Claude Opus 4.7 on SWE-Bench Pro coding benchmarks (58.6% vs 64.3%) narrows the set of teams for whom this is an obvious yes.
For ChatGPT power users, OpenAI API customers, and enterprise teams already running on the OpenAI stack, this is a serious evaluation candidate, not just a curiosity to bookmark. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.
Pros
- Category-leading performance on agentic and terminal benchmarks (Terminal-Bench 2.0, OSWorld, BrowseComp)
- Matches GPT-5.4 latency despite higher intelligence — no speed trade-off
- Strong enterprise validation with improved hallucination resistance noted by financial-sector users
- Available immediately to all paid ChatGPT subscribers with API access rolling out
Cons
- API pricing increased significantly over GPT-5.4 — $5/M input vs $2.50/M previously
- Trails Claude Opus 4.7 on SWE-Bench Pro coding benchmarks (58.6% vs 64.3%)
- Free-tier access not yet announced, limiting access for non-paying users
- Rapid release cadence may create integration stability concerns for enterprise teams
References
Comments0
Key Features
1. First fully retrained base model since GPT-4.5 — not a fine-tune 2. 82.7% on Terminal-Bench 2.0, leading Claude Opus 4.7 (69.4%) and Gemini 3.1 Pro (68.5%) 3. 84.9% on GDPval across 44 professional knowledge-work occupations 4. 78.7% on OSWorld-Verified for autonomous computer environment operation 5. 90.1% on BrowseComp (Pro variant) for web research 6. Matches GPT-5.4 per-token latency while delivering higher intelligence 7. Released six weeks after GPT-5.4, signaling accelerating release cadence
Key Insights
- GPT-5.5 is the first model since GPT-4.5 to be built from scratch on a new base, rather than fine-tuned from an existing checkpoint
- The 82.7% Terminal-Bench 2.0 score is the highest ever recorded by any model on that benchmark as of April 2026
- OpenAI is framing GPT-5.5 as infrastructure for an AI super app combining ChatGPT, Codex, and an AI browser
- Six-week release cadence between 5.4 and 5.5 suggests OpenAI has accelerated its internal deployment pipeline significantly
- The per-token price increase (from $2.50 to $5 for input) may be offset by the model's improved token efficiency on complex tasks
- GPT-5.5 trails Claude Opus 4.7 on SWE-Bench Pro (58.6% vs 64.3%), suggesting each model still has category-specific strengths
- BrowseComp at 90.1% for the Pro variant makes GPT-5.5 particularly powerful for research-intensive enterprise workflows
- Bank of New York's endorsement of hallucination resistance signals growing financial-sector confidence in agentic AI
Was this review helpful?
Share
Related AI Reviews
ChatGPT Dreaming V3: OpenAI's Memory Overhaul Brings 82.8% Recall Accuracy
OpenAI's Dreaming V3 replaces ChatGPT's manual memory system with background synthesis, boosting factual recall to 82.8% while raising new privacy questions.
OpenAI Codex Goes Enterprise: Sites, Six Role Plugins, and 5M Weekly Users
OpenAI expanded Codex on June 2, 2026 with a hosted web app builder called Sites, six role-specific plugins for non-developers, and an Annotations editing tool as it eyes the enterprise market.
OpenAI Publishes Frontier Governance Framework: EU AI Act and California Compliance Mapped
OpenAI released a public governance document on May 28, 2026 mapping its internal safety practices to the EU AI Act and California's Transparency in Frontier AI Act, covering cyber offense, CBRN, manipulation, and loss-of-control risks.
OpenAI Adopts C2PA and SynthID to Combat AI-Generated Misinformation
OpenAI joined the C2PA steering committee and embedded Google's SynthID watermarking in all AI images, creating a two-layer provenance standard for detecting synthetic media.
