MIT Kaiming He's Team Releases ELF Language Diffusion Model with 45B Training Tokens

According to Beating, MIT Kaiming He’s team recently released ELF (Embedded Language Flows), a language diffusion model that departs from the autoregressive “predict next token” approach used by GPT-style models. Instead, ELF performs text generation in a continuous embedding space, converting to discrete tokens only in the final step.

In OpenWebText unconditional generation benchmarks, the 105M-parameter ELF-B achieved approximately 24.1 generation perplexity (Gen. PPL) with 32-step sampling, outperforming multiple discrete and continuous diffusion language model baselines. Notably, ELF-B required only approximately 45 billion training tokens, roughly one order of magnitude fewer than comparable methods which typically exceed 500 billion tokens.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Anthropic in Talks to Acquire Developer Tools Startup Stainless for at Least $300M

According to The Information, Anthropic is in advanced talks to acquire developer tools startup Stainless for at least $300 million. Stainless' developer tools have been adopted by OpenAI and Google.

GateNews7m ago

Andrew Ng: “AI won’t trigger a mass wave of job losses,” while software engineering hiring remains strong

Well-known AI scholar and DeepLearning.AI founder Andrew Ng (吳恩達) posted on X and in The Batch newsletter on May 12, arguing that “AI will not trigger a jobpocalypse,” directly rebutting the prevailing narrative that AI will lead to mass job losses. Based on Andrew Ng’s original post, it received more than 2,600 likes and was one of the most talked-about viewpoints in the AI field that week. Ng’s core argument: software engineering hiring remains strong, unemployment stays at 4.3% Ng used three

ChainNewsAbmedia1h ago

Baidu's Kunlun Chip Tian Chi 256-Card Supernode to Launch in June with 25% Throughput Improvement

According to Baidu, on May 13 during the Create 2026 developer conference, the company announced that its Kunlun Chip Tian Chi 256-card supernode will officially launch in June, with throughput performance improved

GateNews1h ago

Cerebras Prices IPO Above $150-160 Range, Raises $4.8B on Massive Demand

According to Bloomberg, Cerebras Systems is set to price its IPO above the US$150-160 range on May 13, 2026, with demand for the share sale surging more than 20 times over available shares. The AI chipmaker is offering 30 million shares and would raise US$4.8 billion at the top of the range,

GateNews1h ago

Meta Offers Rival AI Chatbots One Month Free WhatsApp Access to Avoid EU Antitrust Fine

According to Reuters, Meta offered rival AI chatbots in the European Economic Area (EEA) one month of free WhatsApp Business API access as part of efforts to settle an EU antitrust probe that could result in fines up to 10% of annual global turnover. The European Commission, which signaled in

GateNews1h ago

Xero Launches Claude Integration on May 13

According to Xero, the company launched a live integration with Anthropic's Claude on May 13 that lets subscribers worldwide use Xero data inside Claude.ai. The feature displays live figures such as cash position, overdue invoices, revenue, and receivables while linking responses back to Xero

GateNews1h ago
Comment
0/400
No comments