Nvidia Reveals Blackwell: The ‘World’s Most Powerful Chip’ for AI

At Nvidia’s 2024 GTC AI conference, the company released the much-anticipated Blackwell platform. The platform consists of a new graphics processing unit (GPU), dubbed the “world’s most powerful chip,” the GB200 NVL72 rack-scale system, and a set of enterprise AI tools. Major cloud service providers have announced plans to utilize Blackwell to continue to innovate generative AI, deep learning, and cloud-based computing services.


Blackwell GPU in two dies with on-chip HBM3e.

In 2012, AlexNet kicked off the GPU-based AI boom with a 60 million-parameter computational model. Today’s models are crossing over 10 trillion parameters—a staggering 160,000-fold increase in complexity, with no end in sight. Blackwell, with multi-die GPUs, co-processing CPUs, and Terabyte-scale interconnects, is Nvidia’s 2024 answer to this dramatic upward spiral of computing demand.


Nvidia Unveils the 208-Billion Transistor Blackwell GPU

Nvidia designed the Blackwell GPU to be the world’s largest GPU built specifically for datacenter-scale generative AI. It has 25 times better energy efficiency than Nvidia’s prior-generation GPU chips. The Blackwell architecture was named in honor of David Harold Blackwell, a statistician and mathematician who specialized in game theory, probability theory, and statistics and was the first black scholar inducted into the National Academy of Sciences. This architecture succeeds the Hopper architecture, Nvidia’s last AI flagship.

The GPU is built from two dies joined by a 10 TB/s (terabyte per second) interconnect to create a single two-die GPU. Not long ago, reticle limits—a maximum size dictated by optics limitations—were a major obstacle in advancing high-performance computing chips. Ultra-fast interconnect systems have negated that limit, and the Blackwell-architecture GPUs take advantage of such interconnects. In total, the GPU consists of 208 billion transistors fabbed on a custom TSMC process node.


Blackwell: Built on the Back of Six Innovations

The Blackwell architecture (technical brief linked) is characterized by six major innovations outlined below. 


AI Superchip

Nvidia considers Blackwell a new class of AI superchips with performance that exceeds anything previously offered by Nvidia or competitive processor developers. It backs that claim with five additional GPU innovations.


Second-Generation Transformer Engine

The Blackwell GPUs are built around Nvidia’s second-generation transformer engine, which uses a custom tensor core architecture combined with TensorRT-LLM and NeMo Framework innovations. The resulting processing engine accelerates LLM and MoE training and inference. The new tensor cores add additional precision models and community-defined micro-scaling formats for higher accuracy with four-bit floating point (FP4) and 2X size and performance models.


Secure AI

Blackwell addresses the increasing need for advanced security with Nvidia Confidential Computing, a technology that leverages strong hardware-based security to prevent unauthorized access. It’s the first trusted execution environment for I/O (TEE-I/O)- capable GPU. TEE-I/O works as a hardware security layer in concert with innovation number four, the fifth-generation Nvlink and Nvlink Switch.


Nvlink and Nvlink Switch

Exascale computing with trillion-parameter AI models relies on communications as much as computation. No GPU operates alone, and the ability to communicate is make or break for modern AI computing. Nvlink technology is Nvidia’s solution for GPU interconnects, enabling clusters of up to 576 processors. The Nvlink switch chip delivers 130 TB/s GPU bandwidth in a 72 GPU domain (NVL72) and 1.8 TB/s cluster interconnect.


Decompression Engine

The Blackwell GPUs collaborate with a Hopper CPU to accelerate mass data handling. The decompression engine delivers 900 GB/s bidirectional bandwidth for full pipeline database queries in multiple compression formats, such as LZ4, Deflate, and Snappy.


Reliability, Availability, and Serviceability (RAS) Engine

Processing and communicating may be at the core of AI, but fault tolerance and predictive management are equally important to delivering reliable output. Nvidia’s RAS identifies and localizes issue sources by continuously monitoring thousands of hardware and software data points.


The Blackwell GB200 Superchip and NVL72 Server

The GB200 Superchip, implemented for maximum performance, includes two Blackwell GPUs and one Nvidia Grace CPU. It comes with up to 384 GB of high-bandwidth memory 3e (HBM3e) on chip with up to 16 TB/s memory bandwidth. Multiple GB200s can be linked in cluster form with Nvidia’s new Quantum-X800 and Spectrum-X800 Ethernet at speeds up to 800 GB/s.


GB200 Superchip with two Blackwell GPUs and one Grace CPU

GB200 Superchip with two Blackwell GPUs and one Grace CPU. 


Nvidia also introduced its take on the server rack, the GB200 NVL72. The server system combines 36 GB200 superchips (for a total of 72 GPUs) in a liquid-cooled enclosure. The GB200 NVL72 acts as a single GPU, with a 30X performance increase and a 25X decrease in total cost of ownership (TCO) over the prior-generation Nvidia server, the H100.


The GB200 NVL72 server with up to 36 GB200 superchips
The GB200 NVL72 server with up to 36 GB200 superchips. 
No Shortage of Partners 

It’s no surprise that most of the data center and AI industry has lined up behind Blackwell. Nvidia pioneered GPUs for massively parallel processing and has continued to push innovation. Google/Alphabet expects to leverage Blackwell in its shift to AI across its cloud platform and its Deepmind initiative. Amazon, a long-time collaborator with Nvidia, will be integrating Blackwell into AWS. Nvidia will, in turn, continue to co-develop project Ceiba, an advanced networking and AI research project, with Amazon. Dell, Facebook/Meta, Microsoft, OpenAI, Oracle, and Tesla have all also announced plans to integrate Blackwell into their AI initiatives.



All images used courtesy of Nvidia. 

Source link

Leave a comment