LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768
input/1,024 output, NVIDIA HGX™ B300 scaled over InfiniBand (IB) vs. GB300 NVL72, training
1.8T MOE 4096x HGX B300 scaled over IB vs. 456x GB300 NVL72 scaled over IB. Cluster size:
32,768
A database join and aggregation workload with Snappy / Deflate compression derived from
TPC-H Q4 query. Custom query implementations for x86, B300 single GPU and single GPU from
GB300 NLV72 vs. Intel Xeon 8480+
Projected performance subject to change.
Real-Time LLM Inference
GB300 NVL72 introduces cutting-edge capabilities and a second-generation Transformer
Engine which enables FP4 AI and when coupled with fifth-generation NVIDIA NVLink,
delivers 30X faster real-time LLM inference performance for trillion-parameter language
models. This advancement is made possible with a new generation of Tensor Cores, which
introduce new microscaling formats, giving high accuracy and greater throughput.
Additionally, the GB300 NVL72 uses NVLink and liquid cooling to create a single massive
72-GPU rack that can overcome communication bottlenecks.
Massive-Scale Training
GB300 NVL72 includes a faster second-generation Transformer Engine featuring FP8
precision, enabling a remarkable 4X faster training for large language models at scale.
This breakthrough is complemented by the fifth-generation NVLink, which provides 1.8
terabytes per second (TB/s) of GPU-to-GPU interconnect, InfiniBand networking, and
NVIDIA Magnum IO™ software.
Energy-Efficient Infrastructure
Liquid-cooled GB300 NVL72 racks reduce a data center's carbon footprint and energy
consumption. Liquid cooling increases compute density, reduces the amount of floor space
used, and facilitates high-bandwidth, low-latency GPU communication with large NVLink
domain architectures. Compared to NVIDIA B300 air-cooled infrastructure, GB300 delivers
25X more performance at the same power while reducing water consumption.
Data Processing
Databases play critical roles in handling, processing, and analyzing large volumes of
data for enterprises. GB300 takes advantage of the high-bandwidth memory performance,
NVLink-C2C, and dedicated decompression engines in the NVIDIA Blackwell architecture to
speed up key database queries by 18X compared to CPU and deliver a 5X better TCO.