NVIDIA: H100 Hopper Accelerator Now in Full Production, DGX Shipping In Q1’23by Ryan Smith on September 20, 2022 12:18 PM EST
With NVIDIA’s fall GTC event in full swing, the company touched upon the bulk of its core business in one way or another in this morning’s keynote. On the enterprise side of matters, one of the longest-awaited updates was the shipment status of NVIDIA’s H100 “Hopper” accelerator, which at introduction was slated to land in Q3 of this year. As it turns out, with Q3 already nearly over H100 is not going to make its Q3 availability date. But, according to NVIDIA the accelerator is in full production, and the first systems will be shipping from OEMs in October.
First revealed back in March at NVIDIA’s annual spring GTC event, the H100 is NVIDIA’s next-generation high performance accelerator for servers, hyperscalers, and similar markets. Based on the Hopper architecture and built on TSMC’s 4nm “4N” process, H100 is the follow-up to NVIDIA’s very successful A100 accelerator. Among other changes, the newest accelerator from the company implements HBM3 memory, support for transformer models within its tensor cores, support for dynamic programming, an updated version of multi-instance GPU with more robust isolation, and a whole lot more computational throughput for both vector and tensor datatypes. Based around NVIDIA’s hefty 80 billion transistor GH100 GPU, the H100 accelerator is also pushing the envelope in terms of power consumption, with a maximum TDP of 700 Watts.
Given that NVIDIA’s spring GTC event didn’t precisely align with their manufacturing window for this generation, the H100 announcement earlier this year stated that NVIDIA would be shipping the first H100 systems in Q3. However, NVIDIA’s updated delivery goals outlined today mean that the Q3 date has slipped. The good news is that H100 is in “full production”, as NVIDIA terms it. The bad news is that it would seem that production and integration didn’t start quite on time; at this point the company does not expect the first production systems to reach customers until October, the start of Q4.
Throwing a further spanner into matters, the order in which systems and products are rolling out is essentially being reversed from NVIDIA’s usual strategy. Rather than starting with systems based on their highest-performance SXM form factor parts first, NVIDIA’s partners are instead starting with the lower performing PCIe cards. That is to say that the first systems shipping in October will be using the PCIe cards, and it will only be later in the year that NVIDIA’s partners ship systems that integrate the faster SXM cards and their HGX carrier board.
|NVIDIA Accelerator Specification Comparison|
|H100 SXM||H100 PCIe||A100 SXM||A100 PCIe|
|FP32 CUDA Cores||16896||14592||6912||6912|
|Memory Clock||4.8Gbps HBM3||3.2Gbps HBM2e||3.2Gbps HBM2e||3.0Gbps HBM2e|
|Memory Bus Width||5120-bit||5120-bit||5120-bit||5120-bit|
|FP32 Vector||60 TFLOPS||48 TFLOPS||19.5 TFLOPS||19.5 TFLOPS|
|FP64 Vector||30 TFLOPS||24 TFLOPS||9.7 TFLOPS
(1/2 FP32 rate)
(1/2 FP32 rate)
|INT8 Tensor||2000 TOPS||1600 TOPS||624 TOPS||624 TOPS|
|FP16 Tensor||1000 TFLOPS||800 TFLOPS||312 TFLOPS||312 TFLOPS|
|TF32 Tensor||500 TFLOPS||400 TFLOPS||156 TFLOPS||156 TFLOPS|
|FP64 Tensor||60 TFLOPS||48 TFLOPS||19.5 TFLOPS||19.5 TFLOPS|
18 Links (900GB/sec)
12 Links (600GB/sec)
12 Links (600GB/sec)
|Manufacturing Process||TSMC 4N||TSMC 4N||TSMC 7N||TSMC 7N|
Meanwhile, NVIDIA’s flagship DGX systems, which are based on their HGX platform and are typically among the very first systems to ship, are now going to be among the last. NVIDIA is opening pre-orders for DGX H100 systems today, with delivery slated for Q1 of 2023 – 4 to 7 months from now. This is good news for NVIDIA’s server partners, who in the last couple of generations have had to wait to go after NVIDIA, but it also means that H100 as a product will not be able to put its best foot forward when it starts shipping in systems next month.
In a pre-briefing with the press, NVIDIA did not offer a detailed explanation as to why H100 has ended up delayed. Though speaking at a high level, company representatives did state that the delay was not for component reasons. Meanwhile, the company cited the relative simplicity of the PCIe cards for the reason that PCIe systems are shipping first; those are largely plug-and-play inside generic PCIe infrastructure, whereas the H100 HGX/SXM systems were more complex and took longer to finish.
There are some notable feature differences between the two form factors, as well. The SXM version is the only one that uses HBM3 memory (PCIe uses HBM2e), and the PCIe version requires fewer working SMs (114 vs. 132). So there is some wiggle room here for NVIDIA to hide early yield issues, if indeed that's even a factor.
Complicating matters for NVIDIA, the CPU side of DGX H100 is based on Intel’s repeatedly delayed 4th generation Xeon Scalable processors (Sapphire Rapids), which at the moment still do not have a release data completely nailed down. Less optimistic projections have It launching in Q1, which does align with NVIDIA’s own release date – though this may very well just be coincidence. Either way, the lack of general availability for Sapphire Rapids is not doing NVIDIA any favors here.
Ultimately, with NVIDIA unable to ship DGX until next year, NVIDIA's server partners aren't only going to beat them to the punch with PCIe-based systems, but they will be the first out the door with HGX-based systems as well. Presumably those initial systems will be using current-generation hosts, or possibly AMD’s Genoa platform if it’s ready in time. Among the firms slated to ship H100 systems are the usual suspects, including Supermicro, Dell, HPE, Gigabyte, Fujitsu, Cisco, and Atos.
Meanwhile, for customers who are eager to try out H100 before they buy any hardware, H100 is now available on NVIDIA’s LaunchPad service.
Finally, while we’re on the subject of H100, NVIDIA is also using this week’s GTC to announce an update to licensing for their NVIDIA AI Enterprise software stack. H100 now comes with a 5-year license for the software, which is notable since a 5 year subscription is normally $8000 per CPU socket.