GPT-5.3-Codex-Spark: OpenAI's First Real-Time Coding Model on Cerebras Hardware
OpenAI launches GPT-5.3-Codex-Spark, delivering over 1,000 tokens per second on Cerebras WSE-3 chips for ultra-low latency coding workflows.
OpenAI launches GPT-5.3-Codex-Spark, delivering over 1,000 tokens per second on Cerebras WSE-3 chips for ultra-low latency coding workflows.
OpenAI Breaks New Ground with Real-Time Coding
On February 12, 2026, OpenAI unveiled GPT-5.3-Codex-Spark, a smaller, speed-optimized variant of GPT-5.3-Codex designed specifically for real-time interactive coding. This marks a significant milestone: it is OpenAI's first production deployment on hardware other than Nvidia, running on Cerebras' Wafer Scale Engine 3.
The release signals a strategic shift in how AI-powered coding tools operate. Rather than prioritizing raw capability at the expense of latency, Codex-Spark focuses on making AI assistance feel instantaneous, enabling developers to integrate AI into their edit-compile-debug loops without waiting.
Key Technical Specifications
GPT-5.3-Codex-Spark delivers over 1,000 tokens per second when running on the Cerebras WSE-3 chip, a massive silicon wafer containing more than 4 trillion transistors. The model features a 128k token context window and operates as a text-only model at launch.
OpenAI achieved substantial infrastructure improvements alongside the model release. Through the introduction of persistent WebSocket connections and targeted optimizations, the company reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%.
Benchmark Performance
Despite being a smaller model optimized for speed, Codex-Spark demonstrates strong coding capabilities. On Terminal-Bench 2.0, it achieved 77.3% accuracy, a significant improvement over GPT-5.2-Codex's 64% score. The model also surpasses the reasoning capabilities of both GPT-5.2-Codex and the standard GPT-5.2 model in coding-specific tasks.
On SWE-Bench Pro, the model shows competitive performance while completing tasks in a fraction of the time compared to the full GPT-5.3-Codex. This speed-accuracy tradeoff positions it ideally for interactive development scenarios.
Beyond Basic Code Completion
Codex-Spark is not limited to simple code generation. The model handles a broad range of development tasks including debugging, deploying, monitoring, writing product requirement documents, editing copy, conducting user research, writing tests, and tracking metrics. This expanded scope makes it a comprehensive development companion rather than just a code autocomplete tool.
The Cerebras Partnership
The deployment on Cerebras hardware represents OpenAI's first move away from exclusive reliance on Nvidia GPUs for production workloads. The partnership, announced in January 2026, leverages Cerebras' purpose-built Wafer Scale Engine 3 accelerator, which is specifically designed for low-latency inference workloads.
This diversification of hardware partnerships could have significant implications for the AI infrastructure landscape, potentially reducing dependency on a single chip manufacturer and opening new performance optimization pathways.
Availability and Access
Codex-Spark is currently available as a research preview exclusively for ChatGPT Pro subscribers. Usage during this preview phase does not count toward standard API limits, with separate rate limits that may be adjusted. API access is expected to arrive soon, at the same pricing as the standard GPT-5.3-Codex model.
What This Means for Developers
The release of Codex-Spark represents a paradigm shift in AI-assisted development. With sub-second response times and strong coding accuracy, developers can now use AI assistance in real-time editing workflows where even a few seconds of latency would break concentration. The 128k context window ensures the model can understand large codebases, while the speed optimizations make it practical for continuous use throughout a coding session.
For teams evaluating AI coding tools, Codex-Spark offers a compelling combination of speed and capability that addresses one of the most common complaints about AI coding assistants: they are powerful but too slow for interactive use.
Pros
- Ultra-fast inference at over 1,000 tokens per second enables seamless real-time coding
- Strong benchmark performance despite being optimized for speed over raw capability
- Broad task coverage including debugging, testing, deployment, and documentation
- 128k context window supports understanding large codebases
- Infrastructure improvements benefit the entire coding experience beyond just model speed
Cons
- Currently limited to ChatGPT Pro subscribers as a research preview
- Text-only model at launch with no multimodal capabilities
- API access not yet available, limiting integration into existing developer workflows
- Smaller model size means some capability tradeoffs compared to full GPT-5.3-Codex
References
Comments0
Key Features
GPT-5.3-Codex-Spark is OpenAI's first real-time coding model, delivering over 1,000 tokens per second on Cerebras' Wafer Scale Engine 3 hardware. It features a 128k context window, achieves 77.3% on Terminal-Bench 2.0, and reduces time-to-first-token by 50% through infrastructure optimizations including persistent WebSocket connections. The model handles everything from code generation to debugging, deployment, and documentation.
Key Insights
- First OpenAI production deployment on non-Nvidia hardware, signaling infrastructure diversification
- Over 1,000 tokens per second throughput enables truly real-time coding interactions
- Terminal-Bench 2.0 score of 77.3% significantly exceeds GPT-5.2-Codex's 64%
- 80% reduction in client/server roundtrip overhead through persistent WebSocket connections
- Cerebras WSE-3 chip contains over 4 trillion transistors on a single wafer
- Model scope extends beyond coding to PRDs, user research, and deployment tasks
- Research preview model available only to ChatGPT Pro subscribers initially
Was this review helpful?
Share
Related AI Reviews
OpenAI Secures Pentagon Classified Network Deal Hours After Anthropic Blacklisted
OpenAI deploys AI models in the Pentagon's classified network with three red-line safeguards, filling the gap left by Anthropic's supply-chain-risk designation.
OpenAI Finalizes $110 Billion Funding Round at $730 Billion Valuation
OpenAI closes the largest private funding round in history with $110B from Amazon, Nvidia, and SoftBank, reaching a $730 billion valuation.
OpenAI Nears $100 Billion Funding Round at an $850 Billion Valuation
OpenAI is finalizing the first phase of a record-breaking $100B+ funding round with Amazon, SoftBank, Nvidia, and Microsoft, pushing its valuation past $850 billion.
OpenAI Launches Trusted Access for Cyber: A $10M Bet on AI-Powered Defense
OpenAI introduces Trusted Access for Cyber, an identity-based framework pairing GPT-5.3-Codex's high-capability cybersecurity skills with $10 million in API credits to accelerate defensive security operations.
