GPT-5.3-Codex-Spark: OpenAI's First Real-Time Coding Model on Cerebras Hardware

OpenAI launches GPT-5.3-Codex-Spark, delivering over 1,000 tokens per second on Cerebras WSE-3 chips for ultra-low latency coding workflows.

#OpenAI#GPT-5.3#Codex-Spark#Real-Time Coding#Cerebras

GPT-5.3-Codex-Spark: OpenAI's First Real-Time Coding Model on Cerebras Hardware

AI Summary

OpenAI launches GPT-5.3-Codex-Spark, delivering over 1,000 tokens per second on Cerebras WSE-3 chips for ultra-low latency coding workflows.

OpenAI Breaks New Ground with Real-Time Coding

On February 12, 2026, OpenAI unveiled GPT-5.3-Codex-Spark, a smaller, speed-optimized variant of GPT-5.3-Codex designed specifically for real-time interactive coding. This marks a significant milestone: it is OpenAI's first production deployment on hardware other than Nvidia, running on Cerebras' Wafer Scale Engine 3.

The release signals a strategic shift in how AI-powered coding tools operate. Rather than prioritizing raw capability at the expense of latency, Codex-Spark focuses on making AI assistance feel instantaneous, enabling developers to integrate AI into their edit-compile-debug loops without waiting.

Key Technical Specifications

GPT-5.3-Codex-Spark delivers over 1,000 tokens per second when running on the Cerebras WSE-3 chip, a massive silicon wafer containing more than 4 trillion transistors. The model features a 128k token context window and operates as a text-only model at launch.

OpenAI achieved substantial infrastructure improvements alongside the model release. Through the introduction of persistent WebSocket connections and targeted optimizations, the company reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%.

Benchmark Performance

Despite being a smaller model optimized for speed, Codex-Spark demonstrates strong coding capabilities. On Terminal-Bench 2.0, it achieved 77.3% accuracy, a significant improvement over GPT-5.2-Codex's 64% score. The model also surpasses the reasoning capabilities of both GPT-5.2-Codex and the standard GPT-5.2 model in coding-specific tasks.

On SWE-Bench Pro, the model shows competitive performance while completing tasks in a fraction of the time compared to the full GPT-5.3-Codex. This speed-accuracy tradeoff positions it ideally for interactive development scenarios.

Beyond Basic Code Completion

Codex-Spark is not limited to simple code generation. The model handles a broad range of development tasks including debugging, deploying, monitoring, writing product requirement documents, editing copy, conducting user research, writing tests, and tracking metrics. This expanded scope makes it a comprehensive development companion rather than just a code autocomplete tool.

The Cerebras Partnership

The deployment on Cerebras hardware represents OpenAI's first move away from exclusive reliance on Nvidia GPUs for production workloads. The partnership, announced in January 2026, leverages Cerebras' purpose-built Wafer Scale Engine 3 accelerator, which is specifically designed for low-latency inference workloads.

This diversification of hardware partnerships could have significant implications for the AI infrastructure landscape, potentially reducing dependency on a single chip manufacturer and opening new performance optimization pathways.

Availability and Access

Codex-Spark is currently available as a research preview exclusively for ChatGPT Pro subscribers. Usage during this preview phase does not count toward standard API limits, with separate rate limits that may be adjusted. API access is expected to arrive soon, at the same pricing as the standard GPT-5.3-Codex model.

What This Means for Developers

The release of Codex-Spark represents a paradigm shift in AI-assisted development. With sub-second response times and strong coding accuracy, developers can now use AI assistance in real-time editing workflows where even a few seconds of latency would break concentration. The 128k context window ensures the model can understand large codebases, while the speed optimizations make it practical for continuous use throughout a coding session.

For teams evaluating AI coding tools, Codex-Spark offers a compelling combination of speed and capability that addresses one of the most common complaints about AI coding assistants: they are powerful but too slow for interactive use.

Editor's Verdict

GPT-5.3-Codex-Spark: OpenAI's First Real-Time Coding Model on Cerebras Hardware earns a solid recommendation within the gpt space.

The strongest case for paying attention is ultra-fast inference at over 1,000 tokens per second enables seamless real-time coding, which raises the bar for what readers should now expect from peers in this space. Reinforcing that, strong benchmark performance despite being optimized for speed over raw capability adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: first OpenAI production deployment on non-Nvidia hardware, signaling infrastructure diversification. On the other side of the ledger, currently limited to ChatGPT Pro subscribers as a research preview is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, text-only model at launch with no multimodal capabilities narrows the set of teams for whom this is an obvious yes.

For ChatGPT power users, OpenAI API customers, and enterprise teams already running on the OpenAI stack, this is a serious evaluation candidate, not just a curiosity to bookmark. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.

Pros

Ultra-fast inference at over 1,000 tokens per second enables seamless real-time coding
Strong benchmark performance despite being optimized for speed over raw capability
Broad task coverage including debugging, testing, deployment, and documentation
128k context window supports understanding large codebases
Infrastructure improvements benefit the entire coding experience beyond just model speed

Cons

Currently limited to ChatGPT Pro subscribers as a research preview
Text-only model at launch with no multimodal capabilities
API access not yet available, limiting integration into existing developer workflows
Smaller model size means some capability tradeoffs compared to full GPT-5.3-Codex

References

Introducing GPT-5.3-Codex-Spark OpenAI released GPT-5.3-Codex-Spark, a real-time coding model OpenAI's rapid GPT-5.3-Codex model moves beyond simple coding tasks A new version of OpenAI's Codex is powered by a new dedicated chip OpenAI launches GPT-5.3-Codex-Spark on Cerebras chips

Comments0

Key Features

GPT-5.3-Codex-Spark is OpenAI's first real-time coding model, delivering over 1,000 tokens per second on Cerebras' Wafer Scale Engine 3 hardware. It features a 128k context window, achieves 77.3% on Terminal-Bench 2.0, and reduces time-to-first-token by 50% through infrastructure optimizations including persistent WebSocket connections. The model handles everything from code generation to debugging, deployment, and documentation.

Key Insights

First OpenAI production deployment on non-Nvidia hardware, signaling infrastructure diversification
Over 1,000 tokens per second throughput enables truly real-time coding interactions
Terminal-Bench 2.0 score of 77.3% significantly exceeds GPT-5.2-Codex's 64%
80% reduction in client/server roundtrip overhead through persistent WebSocket connections
Cerebras WSE-3 chip contains over 4 trillion transistors on a single wafer
Model scope extends beyond coding to PRDs, user research, and deployment tasks
Research preview model available only to ChatGPT Pro subscribers initially