Back to list
Apr 23, 2026
2
0
0
GPTNEW

GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench

OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.

#GPT-5.5#OpenAI#Agentic AI#LLM#ChatGPT
GPT-5.5 Launches: OpenAI's Most Capable Agentic Model Scores 82.7% on Terminal-Bench
AI Summary

OpenAI released GPT-5.5 on April 23, 2026 — a fully retrained model with 82.7% Terminal-Bench 2.0 score — pushing toward an AI super app.

What Is GPT-5.5?

OpenAI released GPT-5.5 on April 23, 2026, rolling it out to Plus, Pro, Business, and Enterprise subscribers across ChatGPT and Codex — just six weeks after GPT-5.4. The compressed release cadence signals how fiercely the frontier AI race is accelerating.

GPT-5.5 is the first fully retrained base model since GPT-4.5, not an incremental fine-tune. OpenAI co-founder Greg Brockman called it "a new class of intelligence" and "a faster, sharper thinker for fewer tokens compared to 5.4."

Key Benchmark Results

GPT-5.5 posts category-leading numbers across agentic and knowledge-work benchmarks:

BenchmarkGPT-5.5Claude Opus 4.7Gemini 3.1 Pro
Terminal-Bench 2.082.7%69.4%68.5%
GDPval (knowledge work)84.9%
SWE-Bench Pro58.6%64.3%
OSWorld-Verified78.7%
BrowseComp (Pro)90.1%

The 82.7% Terminal-Bench 2.0 result — testing complex command-line workflows — is particularly notable, surpassing Claude Opus 4.7 by more than 13 percentage points. GDPval, which benchmarks AI against professionals across 44 knowledge-work occupations, places GPT-5.5 at 84.9%, a strong indicator for enterprise use cases.

On SWE-Bench Pro (end-to-end GitHub issue resolution), Claude Opus 4.7 leads at 64.3% vs. GPT-5.5's 58.6%, though OpenAI has flagged potential memorization concerns in competitor testing methodologies.

Agentic Capabilities

The defining characteristic of GPT-5.5 is its ability to handle extended autonomous workflows with minimal human intervention. The model writes and debugs code, browses the web, fills out spreadsheets, and completes multi-step tasks without requiring a human supervisor at each step.

OpenAI Chief Research Officer Mark Chen highlighted that gains are "especially strong in agentic coding, computer use, knowledge work, and early scientific research — areas where progress depends on reasoning across context and taking action over time." Early testers have documented the model integrating real-time data feeds to construct and execute mock analytical strategies.

OSWorld-Verified at 78.7% measures the model's ability to autonomously navigate and operate within a computer environment — directly relevant to enterprise automation use cases.

Toward the OpenAI Super App

OpenAI framed the GPT-5.5 launch as one step toward a unified service combining ChatGPT, Codex, and an AI browser into a single enterprise tool. The super app vision aligns with enterprise demand for an AI layer that spans research, coding, document production, and autonomous execution within a single subscription.

Bank of New York's CIO noted that GPT-5.5 delivers "meaningful improvements in hallucination resistance" — critical for financial institutions where accuracy errors carry regulatory consequences.

Performance Without Compromise

A notable engineering achievement is that GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while performing at a higher intelligence level. Larger models are typically slower to serve, so maintaining speed parity while improving capability is a meaningful operational advantage for enterprise deployments.

The model also claims improved token efficiency. OpenAI argues that while per-token pricing increased, the model completes tasks in fewer tokens, offsetting costs for most workloads.

Pricing

API pricing for GPT-5.5 is:

  • Standard: $5 per million input tokens, $30 per million output tokens
  • Pro: $30 per million input tokens, $180 per million output tokens

This represents a price increase over GPT-5.4 ($2.50/$15 standard), though OpenAI maintains token efficiency improvements make total-cost-of-task comparable or lower for complex jobs.

Availability

GPT-5.5 is available immediately to ChatGPT Plus, Pro, Business, and Enterprise subscribers. API access is rolling out imminently. Free-tier access is not yet announced.

Pros and Cons Analysis

GPT-5.5's terminal and computer-use benchmark leadership makes it the strongest model for autonomous agentic workflows as of its launch date. However, it trails Claude Opus 4.7 on pure coding tasks measured by SWE-Bench Pro, and the significant API price increase may be a barrier for cost-sensitive teams.

Outlook

The six-week gap between GPT-5.4 and GPT-5.5 confirms that frontier labs are no longer following quarterly release schedules. For enterprise buyers, GPT-5.5 represents the most capable autonomous agent platform currently available, particularly for organizations running knowledge-work workflows, code generation at scale, or scientific research pipelines. OpenAI's super app roadmap suggests that model releases will increasingly come bundled with deeper platform integrations, making the underlying model version less relevant than the full product experience.

Pros

  • Category-leading performance on agentic and terminal benchmarks (Terminal-Bench 2.0, OSWorld, BrowseComp)
  • Matches GPT-5.4 latency despite higher intelligence — no speed trade-off
  • Strong enterprise validation with improved hallucination resistance noted by financial-sector users
  • Available immediately to all paid ChatGPT subscribers with API access rolling out

Cons

  • API pricing increased significantly over GPT-5.4 — $5/M input vs $2.50/M previously
  • Trails Claude Opus 4.7 on SWE-Bench Pro coding benchmarks (58.6% vs 64.3%)
  • Free-tier access not yet announced, limiting access for non-paying users
  • Rapid release cadence may create integration stability concerns for enterprise teams

Comments0

Key Features

1. First fully retrained base model since GPT-4.5 — not a fine-tune 2. 82.7% on Terminal-Bench 2.0, leading Claude Opus 4.7 (69.4%) and Gemini 3.1 Pro (68.5%) 3. 84.9% on GDPval across 44 professional knowledge-work occupations 4. 78.7% on OSWorld-Verified for autonomous computer environment operation 5. 90.1% on BrowseComp (Pro variant) for web research 6. Matches GPT-5.4 per-token latency while delivering higher intelligence 7. Released six weeks after GPT-5.4, signaling accelerating release cadence

Key Insights

  • GPT-5.5 is the first model since GPT-4.5 to be built from scratch on a new base, rather than fine-tuned from an existing checkpoint
  • The 82.7% Terminal-Bench 2.0 score is the highest ever recorded by any model on that benchmark as of April 2026
  • OpenAI is framing GPT-5.5 as infrastructure for an AI super app combining ChatGPT, Codex, and an AI browser
  • Six-week release cadence between 5.4 and 5.5 suggests OpenAI has accelerated its internal deployment pipeline significantly
  • The per-token price increase (from $2.50 to $5 for input) may be offset by the model's improved token efficiency on complex tasks
  • GPT-5.5 trails Claude Opus 4.7 on SWE-Bench Pro (58.6% vs 64.3%), suggesting each model still has category-specific strengths
  • BrowseComp at 90.1% for the Pro variant makes GPT-5.5 particularly powerful for research-intensive enterprise workflows
  • Bank of New York's endorsement of hallucination resistance signals growing financial-sector confidence in agentic AI

Was this review helpful?

Share

Twitter/X