Apr 23, 2026

GPT

GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench

OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.

#GPT-5.5#OpenAI#Agentic AI#LLM#ChatGPT

GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench

AI Summary

OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.

What Is GPT-5.5?

OpenAI released GPT-5.5 on April 23, 2026, rolling it out to Plus, Pro, Business, and Enterprise subscribers across ChatGPT and Codex — just six weeks after GPT-5.4. The compressed release cadence signals how fiercely the frontier AI race is accelerating.

GPT-5.5 is the first fully retrained base model since GPT-4.5, not an incremental fine-tune. OpenAI co-founder Greg Brockman called it "a new class of intelligence" and "a faster, sharper thinker for fewer tokens compared to 5.4."

Key Benchmark Results

GPT-5.5 posts category-leading numbers across agentic and knowledge-work benchmarks:

Benchmark	GPT-5.5	Claude Opus 4.7	Gemini 3.1 Pro
Terminal-Bench 2.0	82.7%	69.4%	68.5%
GDPval (knowledge work)	84.9%	—	—
SWE-Bench Pro	58.6%	64.3%	—
OSWorld-Verified	78.7%	—	—
BrowseComp (Pro)	90.1%	—	—

The 82.7% Terminal-Bench 2.0 result — testing complex command-line workflows — is particularly notable, surpassing Claude Opus 4.7 by more than 13 percentage points. GDPval, which benchmarks AI against professionals across 44 knowledge-work occupations, places GPT-5.5 at 84.9%, a strong indicator for enterprise use cases.

On SWE-Bench Pro (end-to-end GitHub issue resolution), Claude Opus 4.7 leads at 64.3% vs. GPT-5.5's 58.6%, though OpenAI has flagged potential memorization concerns in competitor testing methodologies.

Agentic Capabilities

The defining characteristic of GPT-5.5 is its ability to handle extended autonomous workflows with minimal human intervention. The model writes and debugs code, browses the web, fills out spreadsheets, and completes multi-step tasks without requiring a human supervisor at each step.

OpenAI Chief Research Officer Mark Chen highlighted that gains are "especially strong in agentic coding, computer use, knowledge work, and early scientific research — areas where progress depends on reasoning across context and taking action over time." Early testers have documented the model integrating real-time data feeds to construct and execute mock analytical strategies.

OSWorld-Verified at 78.7% measures the model's ability to autonomously navigate and operate within a computer environment — directly relevant to enterprise automation use cases.

Toward the OpenAI Super App

OpenAI framed the GPT-5.5 launch as one step toward a unified service combining ChatGPT, Codex, and an AI browser into a single enterprise tool. The super app vision aligns with enterprise demand for an AI layer that spans research, coding, document production, and autonomous execution within a single subscription.

Bank of New York's CIO noted that GPT-5.5 delivers "meaningful improvements in hallucination resistance" — critical for financial institutions where accuracy errors carry regulatory consequences.

Performance Without Compromise

A notable engineering achievement is that GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while performing at a higher intelligence level. Larger models are typically slower to serve, so maintaining speed parity while improving capability is a meaningful operational advantage for enterprise deployments.

The model also claims improved token efficiency. OpenAI argues that while per-token pricing increased, the model completes tasks in fewer tokens, offsetting costs for most workloads.

Pricing

API pricing for GPT-5.5 is:

Standard: $5 per million input tokens, $30 per million output tokens
Pro: $30 per million input tokens, $180 per million output tokens

This represents a price increase over GPT-5.4 ($2.50/$15 standard), though OpenAI maintains token efficiency improvements make total-cost-of-task comparable or lower for complex jobs.

Availability

GPT-5.5 is available immediately to ChatGPT Plus, Pro, Business, and Enterprise subscribers. API access is rolling out imminently. Free-tier access is not yet announced.

Pros and Cons Analysis

GPT-5.5's terminal and computer-use benchmark leadership makes it the strongest model for autonomous agentic workflows as of its launch date. However, it trails Claude Opus 4.7 on pure coding tasks measured by SWE-Bench Pro, and the significant API price increase may be a barrier for cost-sensitive teams.

Outlook

The six-week gap between GPT-5.4 and GPT-5.5 confirms that frontier labs are no longer following quarterly release schedules. For enterprise buyers, GPT-5.5 represents the most capable autonomous agent platform currently available, particularly for organizations running knowledge-work workflows, code generation at scale, or scientific research pipelines. OpenAI's super app roadmap suggests that model releases will increasingly come bundled with deeper platform integrations, making the underlying model version less relevant than the full product experience.

Editor's Verdict

GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench earns a solid recommendation within the gpt space.

The strongest case for paying attention is category-leading performance on agentic and terminal benchmarks (Terminal-Bench 2.0, OSWorld, BrowseComp), which raises the bar for what readers should now expect from peers in this space. Reinforcing that, matches GPT-5.4 latency despite higher intelligence — no speed trade-off adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: GPT-5.5 is the first model since GPT-4.5 to be built from scratch on a new base, rather than fine-tuned from an existing checkpoint. On the other side of the ledger, API pricing increased significantly over GPT-5.4 — $5/M input vs $2.50/M previously is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, trails Claude Opus 4.7 on SWE-Bench Pro coding benchmarks (58.6% vs 64.3%) narrows the set of teams for whom this is an obvious yes.

For ChatGPT power users, OpenAI API customers, and enterprise teams already running on the OpenAI stack, this is a serious evaluation candidate, not just a curiosity to bookmark. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.

Pros

Category-leading performance on agentic and terminal benchmarks (Terminal-Bench 2.0, OSWorld, BrowseComp)
Matches GPT-5.4 latency despite higher intelligence — no speed trade-off
Strong enterprise validation with improved hallucination resistance noted by financial-sector users
Available immediately to all paid ChatGPT subscribers with API access rolling out

Cons

API pricing increased significantly over GPT-5.4 — $5/M input vs $2.50/M previously
Trails Claude Opus 4.7 on SWE-Bench Pro coding benchmarks (58.6% vs 64.3%)
Free-tier access not yet announced, limiting access for non-paying users
Rapid release cadence may create integration stability concerns for enterprise teams

References

OpenAI releases GPT-5.5, bringing company one step closer to an AI 'super app' | TechCrunch OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 | MarkTechPost OpenAI launches GPT-5.5 just weeks after GPT-5.4 as AI race accelerates | Fortune OpenAI Releases GPT-5.5: Faster, Smarter — And Pricier | Decrypt

Comments0

Key Features

1. First fully retrained base model since GPT-4.5 — not a fine-tune 2. 82.7% on Terminal-Bench 2.0, leading Claude Opus 4.7 (69.4%) and Gemini 3.1 Pro (68.5%) 3. 84.9% on GDPval across 44 professional knowledge-work occupations 4. 78.7% on OSWorld-Verified for autonomous computer environment operation 5. 90.1% on BrowseComp (Pro variant) for web research 6. Matches GPT-5.4 per-token latency while delivering higher intelligence 7. Released six weeks after GPT-5.4, signaling accelerating release cadence

Key Insights

GPT-5.5 is the first model since GPT-4.5 to be built from scratch on a new base, rather than fine-tuned from an existing checkpoint
The 82.7% Terminal-Bench 2.0 score is the highest ever recorded by any model on that benchmark as of April 2026
OpenAI is framing GPT-5.5 as infrastructure for an AI super app combining ChatGPT, Codex, and an AI browser
Six-week release cadence between 5.4 and 5.5 suggests OpenAI has accelerated its internal deployment pipeline significantly
The per-token price increase (from $2.50 to $5 for input) may be offset by the model's improved token efficiency on complex tasks
GPT-5.5 trails Claude Opus 4.7 on SWE-Bench Pro (58.6% vs 64.3%), suggesting each model still has category-specific strengths
BrowseComp at 90.1% for the Pro variant makes GPT-5.5 particularly powerful for research-intensive enterprise workflows
Bank of New York's endorsement of hallucination resistance signals growing financial-sector confidence in agentic AI

Was this review helpful?

Twitter/X

Related AI Reviews

Codex Micro Review: OpenAI's First Hardware, a $230 Keypad

NEWGPT

185

Visit Official Site

🟠Anthropic Claude 💎Google Gemini 🤖OpenAI GPT