DeepSeek V4 Pro and Flash: 1.6T Parameters, 1M Context, Frontier Pricing
DeepSeek released V4-Pro (1.6T params, 49B active) and V4-Flash (284B params, 13B active) on April 24, featuring 80.6% SWE-bench and near-frontier performance at a fraction of rival costs.
DeepSeek released V4-Pro (1.6T params, 49B active) and V4-Flash (284B params, 13B active) on April 24, featuring 80.6% SWE-bench and near-frontier performance at a fraction of rival costs.
DeepSeek Returns with V4 — and Another Shock to the Market
Exactly one year after its V3 release rattled Silicon Valley, Chinese AI lab DeepSeek unveiled preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash on April 24, 2026. The two models extend a pattern that has come to define the lab: near-frontier performance delivered at prices that force every major competitor to reconsider their cost structures.
Both models are open-weight, distributed under the MIT license, and available via Hugging Face as well as DeepSeek's own API and web chat interface.
Architecture: Three Key Innovations
DeepSeek built V4 around three architectural advances that distinguish it from its V3.2 predecessor.
Hybrid Attention Architecture combines Compressed Sparse Attention (compression rate 4) with Heavily Compressed Attention (compression rate 128), interleaved throughout the model's layers. This dramatically reduces the KV cache required for long-context inference. At one million tokens, V4-Pro requires only 27% of the inference FLOPs and 10% of the KV cache compared to V3.2 — a 3.7x FLOPs improvement and a 9.5x KV cache reduction. V4-Flash is even more aggressive: 10% of FLOPs (9.8x improvement) and 7% of KV cache (13.7x reduction).
Manifold-Constrained Hyper-Connections address training stability for deep stacks, preventing the loss spikes that have historically plagued trillion-parameter models. Two additional mechanisms — Anticipatory Routing and SwiGLU Clamping — were introduced specifically to stabilize training at this scale.
Muon Optimizer replaces the standard AdamW optimizer for most parameters, delivering faster convergence and more stable training according to DeepSeek's technical report.
Model Specifications
DeepSeek-V4-Pro:
- 1.6 trillion total parameters; 49 billion activated per forward pass
- 61 layers, 7,168 hidden size
- 384 routed experts (6 active) plus 1 shared expert
- Pre-trained on 33 trillion tokens
- FP8 precision (FP4 for routed experts)
DeepSeek-V4-Flash:
- 284 billion total parameters; 13 billion activated per forward pass
- 43 layers, 4,096 hidden size
- 256 routed experts (6 active) plus 1 shared expert
- Pre-trained on 32 trillion tokens
Both models support native 1-million-token context and dual thinking modes with three effort levels: Non-Think (fastest), High (balanced), and Max (extended reasoning with reduced length penalties).
Benchmark Results
DeepSeek's headline numbers are striking, especially on coding tasks:
| Benchmark | V4-Pro Score | Notes |
|---|---|---|
| SWE-bench Verified | 80.6% | Competitive with frontier models |
| LiveCodeBench | 93.5% | Top coding benchmark |
| Codeforces Rating | 3,206 | 23rd percentile among human competitors |
| HMMT 2026 | 95.2% | Advanced mathematics |
| Putnam-200 Pass@8 | 81.0% | University-level math |
| CorpusQA 1M | 62.0% | Long-context retrieval |
Honest disclosure: V4-Pro trails Gemini 3.1 Pro on general knowledge tasks (SimpleQA, MMLU-Pro), falls behind GPT-5.5 on agentic coding benchmarks (Terminal Bench: 67.9% vs. 75.1%), and underperforms Claude Opus 4.6 on long-document retrieval tasks (83.5% vs. 92.9% on MRCR). These are real gaps that enterprise users evaluating V4 for production should assess carefully.
Pricing: The DeepSeek Differentiator
Final public API pricing had not been formally announced at time of publication, but DeepSeek's established pricing philosophy makes the range predictable. V4-Pro is reported at $3.48 per million output tokens, while V4-Flash is targeted at $0.28 per million output tokens — compared to GPT-5.5's $30 per million output tokens and Claude Opus 4.7's $25 per million output tokens. The API supports both OpenAI ChatCompletions and Anthropic message formats, with no long-context price premium.
China Hardware Independence
One strategically significant detail in the V4 release: Huawei announced full Ascend supernode support for V4 deployment. This means Chinese organizations can run DeepSeek V4 entirely on domestic AI hardware without dependence on US GPU exports — a geopolitically meaningful capability given ongoing semiconductor export controls.
Reasoning Modes and Post-Training
Post-training employed separate domain-specialist experts trained via Supervised Fine-Tuning and Group Relative Policy Optimization, unified through On-Policy Distillation where the full model learns from all domain teachers simultaneously. The three reasoning effort levels show meaningful scaling: on SimpleQA-Verified, V4-Pro scores 45.0% in Non-Think mode and 57.9% in Max mode.
Known Limitations
DeepSeek's technical documentation is notably candid about limitations. Long-context retrieval accuracy degrades above 128K tokens, reaching 66% on MRCR at the full 1M context length. Multimodal support is under development and not yet available. The architecture is acknowledged as "relatively complex" by the developers themselves, which could affect third-party deployment and fine-tuning efforts.
Conclusion
DeepSeek V4 Pro and Flash represent a meaningful technical step forward from V3.2, with genuine innovations in attention efficiency, training stability, and reasoning capability. The combination of SWE-bench 80.6%, a 1-million-token context window, and sub-$4 per-million-token output pricing will exert continued downward pressure on frontier AI pricing. For development teams primarily focused on coding and mathematical reasoning who are cost-sensitive, DeepSeek V4 is a serious option. For organizations requiring top-tier multimodal performance, long-document retrieval, or agentic computer use, frontier proprietary models currently hold a measurable edge.
Pros
- Best-in-class pricing at $3.48/M output tokens for a frontier-grade coding model
- SWE-bench 80.6% and LiveCodeBench 93.5% are genuinely competitive on coding benchmarks
- Native 1M-token context with significant KV cache efficiency gains vs V3.2
- Open weights under MIT license enable self-hosting and fine-tuning
- Transparent benchmark reporting including acknowledged weaknesses vs competitors
Cons
- Trails GPT-5.5 on agentic computer use tasks (Terminal Bench 67.9% vs 75.1%)
- Long-context retrieval degrades meaningfully above 128K tokens (66% vs 92.9% Claude Opus 4.6 at 1M on MRCR)
- No multimodal support at launch; text-only limits applicability for vision-heavy workflows
- Complex architecture may complicate third-party deployment and fine-tuning efforts
References
Comments0
Key Features
1. DeepSeek-V4-Pro: 1.6T total parameters, 49B activated per token, 61 layers, pre-trained on 33 trillion tokens 2. DeepSeek-V4-Flash: 284B total parameters, 13B activated per token, 256 routed experts, pre-trained on 32 trillion tokens 3. Hybrid Attention Architecture reduces V4-Pro KV cache by 90% vs V3.2 at 1M token context 4. SWE-bench Verified 80.6% and LiveCodeBench 93.5% — competitive with frontier proprietary models on coding 5. V4-Pro at $3.48/M output tokens vs GPT-5.5 at $30/M — roughly 8.6x cheaper 6. Full Huawei Ascend chip support enables China deployment without US GPU dependencies 7. MIT license with open weights on Hugging Face
Key Insights
- DeepSeek's efficiency gains (27% FLOPs, 10% KV cache vs V3.2 at 1M context) represent a genuine architectural advance, not just incremental scaling
- The $3.48 per million output tokens price point will force further downward pressure on frontier model pricing across the industry
- SWE-bench 80.6% is competitive with Claude Opus 4.7 and marks DeepSeek's strongest coding performance to date
- Huawei Ascend chip compatibility signals that China's domestic AI ecosystem is maturing toward hardware independence from US suppliers
- Honest benchmark disclosure — acknowledging gaps versus Gemini, GPT-5.5, and Claude — increases developer trust in V4's evaluation claims
- The dual-model strategy (Pro + Flash) mirrors Anthropic's Opus/Haiku and OpenAI's flagship/mini tiering, serving both performance and cost-optimization use cases
- MIT licensing continues DeepSeek's open-weight philosophy, enabling enterprise self-hosting without API dependency
- Long-context retrieval degradation above 128K remains a practical limitation for enterprise document processing workflows
Was this review helpful?
Share
Related AI Reviews
xAI Launches Grok Voice Think Fast 1.0: #1 on τ-voice Bench, Powers Starlink Support
xAI released Grok Voice Think Fast 1.0 on April 25, 2026, topping the τ-voice Bench at 67.3% and powering Starlink's customer support with a 70% autonomous resolution rate.
Kimi K2.6: Moonshot AI's 1-Trillion Parameter Open-Weight Model Challenges US Frontier LLMs
Moonshot AI released Kimi K2.6 on April 20, 2026 — a 1-trillion parameter open-weight model with 300-agent swarm support and benchmark scores that rival GPT-5.4 and Claude Opus 4.6.
xAI Grok 4.3 Beta Launches for SuperGrok Heavy: 500B Parameters with 1T on the Way
xAI released Grok 4.3 beta on April 17, 2026, exclusively for SuperGrok Heavy subscribers, with a live 500B-parameter checkpoint and a full 1 trillion-parameter version finishing training within days.
DeepSeek R2 Review: 32B Open-Weight Model Hits 92.7% on AIME at 70% Lower Cost
DeepSeek releases R2, a 32B dense transformer reasoning model that achieves frontier-level math scores on a single consumer GPU, priced 70% below Western alternatives.
