
DeepSeek officially released a V4 preview series on April 24. It is open-sourced under the MIT license, and the model weights have been simultaneously published on Hugging Face and ModelScope. According to the DeepSeek V4 technical report, V4-Pro-Max (the strongest inference mode) scored 3206 on the Codeforces benchmark, surpassing GPT-5.4.
Specifications for Two MoE Model Architectures
According to the DeepSeek V4 technical report, the V4 series includes two mixture-of-experts (MoE) models:
V4-Pro: Total parameters 1.6T, 49B activated per token, supports a 1M token context
V4-Flash: Total parameters 284B, 13B activated per token, also supports a 1M token context
According to the technical report, under a 1M context, V4-Pro’s per-token inference FLOPs are only 27% of V3.2, and the KV cache is reduced to 10% of V3.2. This is mainly due to an architectural upgrade of the mixture attention mechanism (compressed sparse attention CSA + heavily compressed attention HCA). The pretraining data scale exceeds 32T tokens; the training optimizer has been updated to Muon.
Post-Training Methodology: Online Policy Distillation Replaces Mixed Reinforcement Learning
According to the DeepSeek V4 technical report, the core update in V4 post-training is that online policy distillation (On-Policy Distillation, OPD) completely replaces the mixed reinforcement learning (mixed RL) stage of V3.2. The new process is divided into two steps: first, domain experts are trained separately for areas such as math, code, agents, and instruction following (SFT + GRPO reinforcement learning); then, the capabilities of a dozen-plus experts are distilled into a unified model using multi-teacher OPD, and logit alignment is used to avoid the common capability conflicts seen in traditional methods.
The report also introduces a generative reward model (Generative Reward Model, GRM). For tasks that are difficult to verify with rules, it is trained with a small amount of diverse human-annotated data, enabling the model to handle both generation and evaluation functions simultaneously.
Benchmark Results: Leading on Coding, Still a Gap in Knowledge Reasoning
According to the DeepSeek V4 technical report, the comparison results of V4-Pro-Max with Opus 4.6 Max, GPT-5.4 xHigh, and Gemini 3.1 Pro High (excluding the recently released GPT-5.5 and Opus 4.7):
Codeforces: 3206 (GPT-5.4: 3168 / Gemini 3.1 Pro: 3052) → Highest across the board
LiveCodeBench: 93.5 → Highest across the board
SWE Verified: 80.6, behind Opus 4.6’s 80.8 by 0.2 percentage points
GPQA Diamond: 90.1, behind Gemini 3.1 Pro’s 94.3
SimpleQA-Verified: 57.9, behind Gemini 3.1 Pro’s 75.6
HLE: 37.7, behind Gemini 3.1 Pro’s 44.4
The technical report also states that the above comparisons exclude the recently released GPT-5.5 and Opus 4.7, and the gap between V4 and the latest generation of closed-source models remains to be verified by third-party evaluations.
Frequently Asked Questions
What are the open-source license terms for the DeepSeek V4 preview version, and where can I obtain them?
According to DeepSeek’s official announcement on April 24, the V4 series is open-sourced under the MIT license. Model weights are available on Hugging Face and ModelScope, and it applies to both commercial and academic use.
What are the differences in parameter scale between DeepSeek V4-Pro and V4-Flash?
According to the DeepSeek V4 technical report, V4-Pro has total parameters of 1.6T, with 49B activated per token; V4-Flash has total parameters of 284B, with 13B activated per token. Both models support a 1M token context.
What are the benchmark comparison results for DeepSeek V4-Pro-Max versus GPT-5.4 and Gemini 3.1 Pro?
According to the DeepSeek V4 technical report, V4-Pro-Max surpasses GPT-5.4 and Gemini 3.1 Pro on the Codeforces (3206) and LiveCodeBench (93.5) benchmarks, but still lags behind Gemini 3.1 Pro on knowledge-intensive benchmarks (GPQA Diamond, SimpleQA-Verified, HLE). The comparison set excludes GPT-5.5 and Opus 4.7.