NVIDIA GB300 NVL72 a cutting-edge, rack-scale AI computing platform.

The NVIDIA GB300 NVL72 is a cutting-edge, rack-scale AI computing platform designed to revolutionize large-scale AI workloads, particularly in reasoning, inference, and training.

1. Architecture & Core Components

GPU/CPU Configuration: The system integrates 72 NVIDIA Blackwell Ultra GPUs and 36 Arm-based NVIDIA Grace CPUs in a single rack, forming a unified exascale computing unit. This design allows it to operate as a “single massive GPU” for seamless communication across AI tasks .
Memory: Each Blackwell Ultra GPU features 288 GB of HBM3e memory (up from 192 GB in GB200), achieved via a 12-layer stacked architecture. The total fast memory capacity per rack reaches 20–40 TB, enabling larger batch processing and handling trillion-parameter AI models .
Interconnects:
5th-Gen NVLink: Delivers 130 TB/s bandwidth for GPU-to-GPU communication, minimizing latency in distributed workloads .
ConnectX-8 SuperNICs: Provides 800 Gb/s (or up to 1.6 Tb/s with optical modules) networking per GPU, paired with NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet for cluster scalability .
Cooling: Fully liquid-cooled with advanced cold plates and quick-disconnect fittings. Supermicro’s implementation uses 40℃ warm water, reducing power consumption by up to 40% compared to air cooling .

2. Performance & Efficiency

AI Inference:
50x higher output compared to NVIDIA Hopper-based systems.
30x faster real-time inference for trillion-parameter LLMs (e.g., DeepSeek R1) due to FP4 Tensor Cores and second-generation Transformer Engine optimizations.
Training: 4x faster training for large models using FP8 precision.
Energy Efficiency:
25x better performance-per-watt vs. H100 GPUs.
Liquid cooling reduces data center carbon footprint and floor space usage.
Throughput: 10x improvement in user responsiveness (TPS/user) and 5x higher throughput per MW over Hopper.

3. Technical Specifications

Component	Details
GPU Memory	288 GB HBM3e per GPU (576 TB/s total bandwidth)
CPU Memory	17 TB LPDDR5X (14.3 TB/s bandwidth)
Tensor Core Performance	1,400 PFLOPS (FP4), 720 PFLOPS (FP8/FP6)
Power Consumption	135–140 kW per rack (TDP), with optional BBUs for backup power
Networking	800 Gb/s per GPU via ConnectX-8 SuperNICs; 1.6 Tb/s optical modules

4. Use Cases & Industry Applications

AI Reasoning & Agentic AI: Optimized for multi-step problem-solving and high-quality response generation in real-time .
Video Inference & Physical AI: Supports applications like real-time video generation and autonomous systems .
Large Language Models (LLMs): Enables trillion-parameter model training and inference with minimal latency .
Enterprise Databases: Accelerates data processing by 18x vs. CPUs through dedicated decompression engines .

5. Industry Deployment & Partnerships

Supermicro: Offers air- and liquid-cooled rack solutions, leveraging a modular design for rapid deployment. Their 8U HGX B300 NVL16 systems complement the GB300 NVL72 for diverse data center needs .
ASUS: Showcased the GB300 NVL72 in its AI Pod at GTC 2025, featuring 18 compute blades with integrated liquid cooling for SSDs and DPUs .
Timeline: Mass production began in May 2025, with full-rack shipments expected in Q3 2025. Major cloud providers like Microsoft are gradually adopting the platform .

6. Cost & Environmental Impact

Cost Efficiency: NVL72’s shared memory architecture reduces expenses for large-batch AI reasoning, offering 10x better tokenomics .
Sustainability: Liquid cooling cuts water usage and operational costs, aligning with green computing initiatives .

Conclusion

The NVIDIA GB300 NVL72 represents a paradigm shift in AI infrastructure, combining unprecedented compute density, memory capacity, and energy efficiency. Its deployment in AI factories and hyperscale data centers positions it as a cornerstone for next-generation AI advancements, from reasoning to real-time trillion-parameter model handling. For deeper technical insights, refer to NVIDIA’s official documentation and partner announcements .