AI chip races over the past two years have almost all centered on HBM, but as AI applications shift from model training to large-scale inference, the next supply bottleneck may not be only HBM, but HBF (High Bandwidth Flash, high-bandwidth flash memory). On April 30, in San Francisco, the United States, David Patterson, a Turing Award winner and a professor at UC Berkeley, said he believes HBF is likely to become the key memory technology driving a sharp rise in demand next—and even forming a new bottleneck.

(What has Nvidia’s Vera Rubin changed? Breaking down the memory warring states era: SK hynix, Samsung, Micron, SanDisk)

Why Turing Award winner David Patterson is bullish on HBF

Discussions about AI memory have almost all revolved around HBM (high-bandwidth memory). But as AI applications move from model training toward large-scale inference, the next supply bottleneck may no longer be just HBM, but HBF (High Bandwidth Flash, high-bandwidth flash memory).

Patterson is a heavyweight figure in computer science and is considered one of the important architects behind RISC. When talking about the next stage after HBM, he noted that although HBF still has many technical challenges to overcome, the HBF being pushed by companies such as SK hynix and SanDisk has the characteristic of “delivering large capacity with lower power consumption.” In the future, the core variable for AI systems will not be computation power alone, but whether data can be effectively stored, scheduled, and supplied.

What is HBF? Stacking NAND Flash doesn’t replace HBM—it’s a division of labor

The biggest difference between HBF and HBM is the underlying memory materials. HBM vertically stacks DRAM to provide the high-bandwidth data access capability required by GPUs and AI accelerators, mainly responsible for “rapidly feeding data to compute units.” HBF stacks non-volatile NAND Flash; its core advantage is not ultra-high speed, but providing larger data capacity at lower cost and lower power consumption.

In other words, HBM solves the “speed” problem during AI computation, while HBF solves the “capacity” problem as AI systems become increasingly large. That’s why HBF is not simply replacing HBM, but forming a new memory division of labor together with HBM. HBM handles real-time, high-speed data exchange; HBF takes on storage needs for large-scale intermediate data, context data, and repeatedly accessed data throughout the inference process.

As the AI inference market expands, HBF demand moves to the front stage

HBF is drawing more attention in 2026 mainly because the focus of the AI market is shifting from training toward inference. AI training involves feeding large amounts of data into a model so it can learn parameters and patterns. AI inference, on the other hand, is the process after the model is trained—producing answers based on user input, executing tasks, remembering context before and after, and continuously making judgments.

In inference scenarios, AI isn’t just answering questions once; it must retain previous conversations, work context, judgment results, tool-call records, and even intermediate data across tasks. The data volume is enormous, and it needs to be repeatedly read and updated.

The problem is that if all of that data is placed in HBM, the cost is too high and the capacity is unrealistic. HBM is suitable for handling fast data needed immediately, but it’s not suited to carry all the context and intermediate state data generated during inference. After AI agents, long-context models, multimodal inference, and enterprise-grade AI workflows become widespread, the system will need not just faster memory, but a larger pool of high-speed data. This is exactly why HBF is being viewed favorably.

SK hynix and SNDK have pushed standardization; HBF demand may surpass HBM in 2038

To pursue higher bandwidth, SK hynix and SanDisk collaborated to develop HBF. It is a 3D stacking technology similar to HBM, but it uses NAND wafers, aiming to deliver several times the throughput of traditional SSDs, serving AI inference in particular.

In February, KAIST professor Kim Jong-ho, from the Department of Electrical Engineering and Electronics Engineering, also pointed out in a technical briefing on HBF that the core of the PC era is the CPU, the core of the smartphone era is low power consumption, and the core of the AI era is memory. He clearly separated the roles of HBM and HBF: HBM determines speed, while HBF determines capacity. Kim Jong-ho also predicted that starting in 2038, HBF demand may surpass HBM.

The logic behind this assessment is that as the AI inference market grows, the amount of real-time context, historical data, and task state the model needs to handle becomes larger. If you rely only on expanding HBM, not only will costs be high, but the overall system power consumption and packaging pressure will continue to rise. If HBF can achieve breakthroughs in bandwidth, packaging, durability, and standardization, it could become the next key memory layer for AI data centers.

From HBM to HBF, the AI race shifts from “compute faster” to “remember and tune”

In the past, when the market talked about AI semiconductors, the focus was largely on GPUs, advanced process nodes, and HBM supply. Especially after Nvidia’s AI server demand surged, HBM briefly became a core metric for judging the competitive strength of memory makers such as SK hynix, Samsung, and Micron. But Patterson’s comments are a reminder that the bottlenecks in AI infrastructure are becoming more complex.

When AI is still in the large-model training race, the focus is to feed GPUs with higher-bandwidth memory. But when AI enters large-scale inference and agent application stages, the questions become: How can the model maintain context for a long time? How can task state be preserved at low cost? How can data flow more efficiently between GPUs, HBM, SSDs, Flash, and network storage?

Therefore, the next phase of the AI memory competition may no longer be just a battle over HBM capacity, but a reorganization across the entire memory hierarchy. HBM remains important because it determines whether AI chips can compute at high speed. But the emergence of HBF means AI systems are starting to need a new type of data layer positioned between traditional storage and high-bandwidth memory. It may not be the fastest, but it could strike a new balance among capacity, power consumption, and cost.

This also means the next keyword in the AI supply chain may expand from “high-bandwidth memory” to “high-bandwidth flash memory.” HBM addresses the real-time computation bottleneck in AI, while HBF may address the much larger data memory bottleneck in the inference era.

Is the AI memory bottleneck after HBM in this article really HBF? Turing Award winner David Patterson: Inference will redefine the storage architecture. First appeared on Lianjing News ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.