Gate News message, April 22 — Princeton PhD student Yifan Zhang disclosed complete technical specifications for DeepSeek V4 on X, following a preview on April 19. V4 features 1.6 trillion total parameters and a lightweight variant, V4-Lite, with 285 billion parameters.
The model employs DSA2 attention mechanism, which combines DeepSeek’s previous DSA (DeepSeek Sparse Attention) from V3.2 and NSA (Native Sparse Attention) with 512-dimensional head embeddings, paired with Sparse Multi-Query Attention (MQA) and Sliding Window Attention (SWA). The MoE (Mixture of Experts) layer contains 384 experts with 6 activated per forward pass, utilizing Fused MoE Mega-Kernel. Residual connections employ Hyper-Connections architecture.
Training details revealed for the first time include the use of Muon optimizer (applying Newton-Schulz orthogonalization to momentum updates), a 32K token pre-training context window, and GRPO (Group Relative Policy Optimization) with KL divergence correction during reinforcement learning. The final context window extends to 1 million tokens. The model is text-only.
Zhang is not employed by DeepSeek, and the company has not officially commented on the disclosed information.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
OpenAI Releases Open-Source Privacy Filter Model for PII Detection and Redaction
Abstract: OpenAI's Privacy Filter is an open-source, locally executable model that detects and redacts PII in text. It supports large contexts, identifies many PII categories, and is intended for privacy-preserving workflows such as data preparation, indexing, logging, and moderation.
OpenAI's Privacy Filter is a locally run, open-source model (128k-token context) that detects and redacts PII in text, covering contact, financial, and credential data for privacy workflows.
GateNews30m ago
OpenAI Plans to Deploy 30GW Computing Power by 2030
OpenAI aims for 30GW of computing by 2030 to meet rising AI demands, with 8GW completed of a 10GW 2025 target. The expansion signals a strategy to scale infrastructure for next-generation AI development and deployment.
OpenAI intends to reach 30GW of computing power by 2030 to accommodate growing AI demands, having already completed 8GW of a 10GW target for 2025. The move reflects a strategic expansion of infrastructure to support next-generation AI development and deployment.
GateNews31m ago
360 AI Vulnerability Discovery Agent Finds Nearly 1,000 Zero-Day Exploits, Competing with Mythos
360 Digital Security's AI-driven agent claims to have found about 1,000 new vulnerabilities, including in Office and OpenClaw; AI now core to discovery and exploit-chain prep, rivaling Mythos.
Abstract: A Bloomberg-cited report notes that 360 Digital Security Group’s AI-driven Vulnerability Discovery Agent identified nearly 1,000 previously unknown vulnerabilities in recent months, including in Microsoft Office and the OpenClaw framework. The firm says AI has become the core engine of vulnerability discovery and has announced an AI tool to accelerate exploit-chain construction. Benincasa characterizes 360 as a competitor to Anthropic’s Mythos, based on Natto Thoughts’ review of the company’s Chinese-language announcements.
GateNews36m ago
Anthropic CEO Heads to the White House for a Break-the-Ice Meeting: Meets with the Chief of Staff and Bessent to Discuss Mythos
The Wall Street Journal reports that Anthropic CEO Amodei met privately with the White House on 4/17, focusing on Mythos’s national security boundaries and responsible deployment; the White House says the meeting was constructive, and the market views it as a thawing in relations. The main point of contention is that the military wants Claude for all lawful purposes, while Anthropic insists on discretion under its own acceptable-use policy. Both sides say they will continue the dialogue and discuss again before Mythos goes live in May.
ChainNewsAbmedia2h ago
Google Ironwood TPU: 10x performance + four partners taking on Nvidia
According to Bloomberg’s in-depth reporting and Google’s official announcements, on April 22 Google officially expanded its lineup of in-house AI chips: it began full availability of Ironwood (the seventh-generation TPU) dedicated to inference on Google Cloud, and simultaneously launched next-generation design collaborations with four partners—Broadcom, MediaTek, Marvell, and Intel. The goal is to use a customized chip supply chain to directly challenge Nvidia’s leading position in the AI compute market.
Ironwood: Seventh-generation TPU, first inference-dedicated by design
Ironwood is Google’s seventh-generation product in the TPU series, and it is the first inference-dedicated chip under the strategy of “splitting training and inference.” The specifications Google disclosed: peak performance per chip is T
ChainNewsAbmedia2h ago
DeepSeek discusses its first round of external funding, valuation at $20 billion: China’s AI valuation hits a new high
According to a Bloomberg report on April 22 (citing The Information’s exclusive), Chinese AI startup DeepSeek is in talks for its first round of external fundraising, valuing the company at $20 billion. This is DeepSeek’s first time raising money from the outside since it was founded in 2023; previously, it was fully funded internally by the quant hedge fund High-Flyer Capital Management. The $20 billion valuation is also a milestone for Chinese AI startups, marking their first entry into the latter half of the “$10 billion-plus valuation” tier.
Fundraising size and intended use of funds
DeepSeek is seeking at least $300 million in its first round of financing. The $20 billion valuation doubles the “valuation above $10 billion” first disclosed by The Information on April 17 earlier
ChainNewsAbmedia2h ago