Revolutionizing Multi-Die Design with the Universal Memory Interface (UMI)


This article is part of the TechXchange: Chiplets – Electronic Design Automation Insights.

What you’ll learn:

What are the current challenges involved with incorporating sufficient HBM into multi-die design?
How a new interconnect technology can address the performance, size, and power issues that could hinder broader adoption of chiplet-based design, including achieving necessary performance without the need for advanced packaging.
Does a new die-to-memory interconnect solution, called Universal Memory Interface (UMI), hold the promise to address the growing “memory wall?”

 

A cruel irony is occurring at an accelerating pace as we drive forward into the generative AI era: While the improvements in processor performance to enable the incredible compute requirements of applications like ChatGPT get all of the headlines, a not-so-new phenomenon known as the memory wall risks negating those advances. Indeed, it’s been clearly demonstrated that as CPU/GPU performance increases, wait time for memory also increases, preventing full utilization of the processors.

With the number of parameters in the generative-AI model ChatGPT-4 reportedly close to 1.4 trillion, artificial intelligence has powered head-on into the memory wall. Other high-performance applications aren’t far behind. The rate at which GPUs and AI accelerators can consume parameters now exceeds the rate at which hierarchical memory structures, even on multi-die assemblies, can supply them. The result is an increasing number of idle cycles while some of the world’s most expensive silicon waits for memory.

Traditionally, three approaches have been used to pry open this bottleneck. The easiest—in the days when Moore’s Law was young—was to make faster DRAM chips with faster interfaces. Today, that well is dry. The second approach was to create a wider pathway between the memory array—which can produce thousands of bits per cycle in parallel—and the processor die. Arguably this has been taken near its practical limit with the 1-kb-wide high-bandwidth-memory (HBM) interface.

The third alternative is to use parallelism above the chip level. Instead of one stack of HBM dies, use four, or eight, each on its own memory bus. In this way, the system architect can expand not just the amount of memory directly connected to a processing die, but also the bandwidth between memory and the compute die.

Space Challenges at the SiP and Die Levels

The trouble is, this approach runs into two hard limits, both involving real estate. At the system-in-package (SiP) level, there’s no room left for more memory. We’re already filling the largest available silicon interposers. Making room for more memory would mean leaving out some computing dies.

At the die level exists a different issue. Computing dies—whether CPU, GPU, or accelerator—are prime real estate, usually built in the most advanced and expensive process technology available. Designers want all of that die area for computing—not for interfaces. They’re reluctant to give up any of it, or any of their power budget, for additional memory channels.

So, it’s a dilemma. Architects need the added memory bandwidth and capacity that more memory channels can bring. But they’re out of area on silicon interposers. And compute designers don’t want to surrender more die area for interfaces.

Fortunately, there’s a solution.

Introducing the Universal Memory Interface

Rather remarkably, one new proposal employing proven technology can relieve both the substrate-level and the die-level real-estate issues. And this, in turn, can push open the memory bottleneck. That technology is the Universal Memory Interface (UMI).



Source link

Leave a comment