Yifan Zhang Discloses DeepSeek V4 Complete Technical Specs: 1.6T Parameters, 384 Experts with 6 Activations

Gate News message, April 22 — Princeton PhD student Yifan Zhang disclosed complete technical specifications for DeepSeek V4 on X, following a preview on April 19. V4 features 1.6 trillion total parameters and a lightweight variant, V4-Lite, with 285 billion parameters.

The model employs DSA2 attention mechanism, which combines DeepSeek’s previous DSA (DeepSeek Sparse Attention) from V3.2 and NSA (Native Sparse Attention) with 512-dimensional head embeddings, paired with Sparse Multi-Query Attention (MQA) and Sliding Window Attention (SWA). The MoE (Mixture of Experts) layer contains 384 experts with 6 activated per forward pass, utilizing Fused MoE Mega-Kernel. Residual connections employ Hyper-Connections architecture.

Training details revealed for the first time include the use of Muon optimizer (applying Newton-Schulz orthogonalization to momentum updates), a 32K token pre-training context window, and GRPO (Group Relative Policy Optimization) with KL divergence correction during reinforcement learning. The final context window extends to 1 million tokens. The model is text-only.

Zhang is not employed by DeepSeek, and the company has not officially commented on the disclosed information.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments