How to Tarnish Platinum: Sell It as Xeon 9200by Dr. Ian Cutress on January 2, 2020 8:00 AM EST
in 2019, Intel announced its Cascade Lake family of enterprise processors, and sitting at the top of the stack was the Cascade Lake-AP family: a quartet of parts that changed Intel’s paradigm for high-end processors. This hardware used two of Intel’s large 28-core silicon dies in the same package, providing a weakly linked dual-processor system in a single package, built to look like a single processor up to 56 cores and 12 memory channels with up to a 400W TDP. Despite not providing pricing, Intel is keen to promote the Xeon 9200 as its extreme performance platform up against AMD’s 64-core EPYC "Rome" offering. We saw a number of Xeon 9200 systems on display at the Supercomputing 2019 show, and the discussions we had were interesting in their own right.
For readers who aren’t familiar with the Xeon 9200 series, or Intel’s enterprise product portfolio: the Cascade Lake Xeon Scalable family offers a variety products, based on three sizes of dies. The smallest die, LCC, goes up to 10 cores. The middle size die, HCC, offers up to 18 cores. The largest sized die, XCC, gives up to 28 cores. This means that anything 10 cores or under, could be LCC, HCC, or XCC, but a 24 core product would only be XCC. The product lines are split by socket compatibility: the Xeon Platinum 8000 supports up to eight sockets, the Xeon Gold 6000 supports up to 4 sockets, Xeon Silver 4000 gives two sockets, and Xeon Bronze 3000 is single socket only. Within each bracket there will be a range of core counts, however, the high-end core counts are typically found only in Xeon Platinum, while Xeon Bronze only offers up to six cores.
The Xeon Platinum 9200 series, on the other hand, is something slightly different. Rather than a single die in the package, there are two: specifically, two XCC-sized silicon dies, connected through the package. As a result, the Xeon 9200 CPUs can offer up to 56 cores per package, with double the memory channels of a regular Xeon processor. Because an individual package has two silicon dies on it, this hardware is limited to a dual socket configuration, which acts pretty much identically to a traditional quad-socket configuration.
There are some features unique to this family compared to Intel’s other Xeons: TDP starts at 250W for the 32-core parts, going up to 400W for the full-fat 56-core part. Each of these processors supports the AVX-512, and are key high-performance processors showcased with Intel’s DL Boost AI acceleration software stack. On a per-core basis, the frequency hits a peak of 3.8 GHz, with a base frequency of 2.6 GHz. When channeling 400W of power though one package, across 56 cores, the per-core frequency isn’t going to be at the top of the stack, but the idea is that with sufficient parallelism, a user can get dual-socket like performance with only a single socket.
The other feature that differentiates his hardware from the other Xeons is the fact that they are BGA-only. This means that the processors are soldered onto the motherboard, and cannot be changed or removed by users. As a result, when a server is sold with one of these processors, it has a fixed processor configuration. Consequently, rather than requiring all of its partners and resellers to build their own motherboards and server systems, Intel is manufacturing reference design systems for its partners to re-sell. Aside from custom installations, say a supercomputer, there is no way to deviate from the Intel reference design platform.
When dealing with these high performance processors, Intel states that for single and dual processor configurations at 250 W each, these can be air cooled. For any system that uses a 350 W or above, either single or dual socket, then liquid cooling is required.
On top of all this, it is worth noting that unlike its socketed regular Xeon processor platform, Intel does not publicly disclose per-processor pricing for its Xeon 9200 series. For people in this field that work with these types of hardware, they will point out that the ‘Intel sticker price’ is almost useless for big customers – the major cloud partners and hyperscalers are likely to be paying a fraction of the price for the hardware, given that they buy servers on the order of 1000s to 100,000s.
Nonetheless, one of the criticisms leveled at Intel’s feet is that this means that Xeon 9200 processors are kind of ‘floating’, and people analyzing the hardware holistically have no way to gauge performance per dollar. Given that Intel’s 28-core 205 W Platinum 8280 has a list price of $10006, quoting a processor using two of these dies on a single package in a specialist BGA system is likely to have a list price that will run double, if not more. Intel doesn’t state whether we should use ~$20,000 for performance per dollar comparisons, or something closer to ~$35,000+, given how different it is to Intel’s regular Xeon product line. If in doubt, use the latter, or push Intel to actually put dollar amounts on its products
On the show floor at Supercomputing, we expected a high concentration of Intel partners and resellers with Xeon 9200 systems on display. As Intel’s highest performance x86 hardware, we would typically expect it to be getting a sizable amount of floor space, coverage, and partner support. Some of this was borne out – a number of Intel’s key partners did indeed have the hardware, one of Intel’s reference 2U half-width blades, and the smaller the reseller, the more prominently it was at the front of the booth. If you were lucky, there would also be a dud packaged CPU with a logo on top.
Interesting to find out the state of play of the Xeon 9200 family, as a member of the press, I did ask as many of Intel’s partners as I could about the 9200 system in front of them. I asked about which processor versions they are stocking, whether the hardware had garnered any interest, and how customers were approaching the platform against how many units they were expecting to sell.
All the smaller resellers said pretty much the same thing: if they stocked any, it would likely be the air-cooled systems. They all talked about people coming to the booth, being interested in learning about the hardware, but none of their customers were ultimately willing to put some money down for one, even though the platform was announced half-a-year ago. One particular reseller, when asked if they expect to sell even one unit, said ‘no’.
One vendor did actually say something half interesting. Colfax, a reseller of OEM systems and a big consultant for a number of companies in the industry, with custom software stacks, is going to be selling the servers direct on its website. Even better than that, they will be allowing customers on the web to use its configurator, to price up a system before they go any further with the purchase. At the time of writing, this configurator is not yet online, but when it is it should give us some indication of the pricing differences between the different Xeon 9200 CPUs (if Intel hasn’t formally disclosed the list prices by that time).
One of the large OEMs was very clear that they don’t plan on stocking or reselling the Xeon 9200 system. Instead, the customers for which the Xeon 9200 might be reasonably relevant are requesting quad-socket platforms and blades. A true quad-socket system will offer more total memory than a Xeon 9200 system, can potentially support Optane, and uses socketed processors that can be adjusted and configured easily. Not only that, but the system would be easier to cool.
Ultimately, Intel’s Xeon 9200 processors are trying to solve one specific issue with certain customers: density. With the right configuration and cooling, a customer can fit two 56-core CPUs into a 1U half-width node, giving 448 threads of AVX-512 performance in a 1U, with liquid cooling. The number of customers that are density constrained to that amount seems to be very low, and those that are on the boundary are telling the resellers that they’d prefer a more configurable slightly lower density configuration that doesn’t require a liquid cooling infrastructure.
One of Intel’s resellers gave us some insight into their contracts. This particular company deals with a number of university supercomputing contracts – those that work with separate research grants and add to their compute power over time, and perhaps spend $20k-$250k per year to build their systems. These systems might also be held off-site. These customers aren’t interested in Xeon 9200. Even bigger customers, spending $1-2m a year, aren’t looking at Xeon 9200 either. For any customer that wants lots of cores, and don’t have a specialized Intel-only software stack, they might even look at AMD’s high-core count offerings that are easier to cool, offer more memory, and more I/O.
Speaking to that, one of the OEMs that provides a number of reference designs for several key supercomputers mentioned that even though they have a strong Intel business, their AMD business is booming, especially for high-performance computing. They see the per-core cost and the overall system cost as big factors, and these customers don’t have any desire to touch the Xeon 9200 family.
With all this being said, there was a lot of presence of Xeon 9200 at Supercomputing. A number of these companies stated that they put it front and center of their booths as a hook – to get people (and customers) to talk about it and then discuss those needs. But ultimately the best hardware for people approaching them was something else.
So Who Exactly is the Xeon 9200 For?
So Intel does have two key wins with the Xeon 9200 hardware. On the TOP500 list of most powerful supercomputers, there are two new entrants with the 9200 hardware.
At #40 is the Lise system, installed at the HLRN in Berlin, which is an Atos Bull cluster using Xeon 9242 (48-core) CPUs , no co-processor accelerators, and an Intel Omnipath interconnect. This system has a theoretical peak throughput of 7.6 PetaFLOPs, and a total of 103,680 cores. This reduces down to 2160 actual Xeon 9242 processors, which at its highest density would be 1080 Us, or 26 racks (likely more, based on power, thermals, cooling, storage nodes, networking nodes, and so on). At 350 W TDP each, CPU Power alone would be 0.756 MW, and the list puts the total system power at 1.258 MW.
At #69 is CTS-1 MAGMA cluster, at Lawrence Livermore National Laboratory. This is a Penguin Computing Relion cluster, also using Xeon 9242 (48-core) CPUs with no co-processor accelerators and an Intel Omnipath interconnect. This system has a peak throughput of 4.6 PetaFLOPs, and a total of 62,400 cores. This reduces down to 1300 actual Xeon 9242 processors.
From #70 all the way down to #500, there are no other Xeon 9200 systems in the list. Just for these two supercomputers, we’re looking at a total of 3460 CPU packages, or 6920 of the XCC dies. Doing some math, knowing the size of the XCC die (694mm2), how many die can fit on a single 300mm wafer (72), and assuming a yield of Intel’s 14++ process as somewhere from 65% to 85%, we’re looking at a total of 110-150 wafers.
We could compare this to one of the big supercomputers that has regular Xeons built on XCC, such as #3 Frontera, which uses Xeon 8280 28-core processors and has a total of 448,448 cores, or ~16000 CPUs / XCC dies. Even going for a smaller supercomputer that uses XCC, like #42 which uses Xeon 6248 20-core dies and has a total of 88,400 cores – this is still 4420 dies. Summing up all the XCC systems shows that the dies that go into the Xeon 9200 hardware are a tiny fraction of what comes out, and even if wafers were specifically made for this hardware, it would again be a tiny amount.
Beyond these two systems, it is hard to gauge exactly how wide-spread Xeon 9200 adoption is. Based on our conversations at the Supercomputing show, partners seemed to be both amused and bemused at the prospect of selling and supporting the platform to anyone who didn’t have a sizable budget to build a fresh supercomputer. In some instances, Intel’s partners would state that going for the lower core count 32 chips didn’t make sense over the standard 28-core, because even despite the four extra cores, the BGA aspect of the system meant less flexibility, a higher cooling requirement, and for single CPU blades, a worse die-to-die bandwidth configuration over a standard socketed 2P system.
In advance of publishing this article, we briefed Intel about our article, and thought it only correct to give them a chance to address the criticisms directly in our article. We asked Intel for the latest official line on its Xeon 9200 series. As part of that response, Intel also supplied commentary from one or two of its partners that have deployed Xeon 9200-based systems.
From Carolyn Henry, Senior Director of Strategic Marketing of the Intel Data Platforms Group:
"High performance computing is one of the most demanding compute and memory bandwidth workloads and requires some of the most advanced technologies. We introduced the Xeon Platinum 9200 to address these workloads, and the insatiable performance demands of our HPC customers. Intel has over 20 years of delivering leadership product to our HPC customers, as is evident by the number of Intel-based Top500 systems, and we are committed to continuing to delivering HPC leadership through solutions like the Xeon Platinum 9200 and the S9200WK server system product family.
Customers deploying the Intel Xeon Platinum 9200 achieve higher node performance, which in turn drives lower TCO as fewer nodes are required for a fixed performance level. Fewer nodes drive lower node acquisition cost, lower fabric, switching, and cabling cost for highly optimized rack-level deployment.”
Also, from William Wu, VP of Hardware Products Penguin Computing, who deploys Xeon 9200 solutions:
“Artificial intelligence is permeating across industries and Penguin Computing data science customers require HPC clusters that are designed and built to address the new and demanding workloads brought by the convergence of AI and HPC. The Intel Xeon Platinum 9200 processors deliver breakthrough levels of performance and have enabled us to design, build and deliver groundbreaking solutions for our customers. Together, we are delivering a converged platform that allows AI to accelerate and speed up HPC modeling, as well as manage HPC and AI workloads more effectively."
Beyond this generation, it has been assumed that Intel will continue its dual-die ‘AP’ strategy into Cooper Lake Xeons in 2020, Ice Lake Xeons in 2020, and perhaps even Sapphire Rapids Xeons in 2021. By contrast, it would appear that AMD’s chiplet strategy has helped the company compete, by offering >1000 mm2 of silicon in a single package. Intel needs a chiplet strategy of its own, and based on our discussions at the show, aside from a few select customers, gluing together two XCC dies doesn’t seem to be the right path. Intel needs to be at the forefront of driving performance and innovation, as we recently saw with its Tremont Atom microarchitecture disclosure and trying new things like dual-decoder groups.
For our readers that keep their ears to the ground on enterprise performance, it would have been hard to miss the fact that Intel pushed out performance numbers on its Xeon 9200 recently in the form of a medium blog post (rather than say, Intel’s own website). Some of Intel’s benchmarking setups comparing AMD to its Xeon 9200 were quickly called into question, as well as the software stacks used. Intel responded in kind, but not a mea culpa as such – it admitted a typo, but stated that it didn’t need to use the latest software as it yielded the same performance.
Of course, we often discourage our readers from reading deeply into first party benchmarks. As much as we would like these companies to provide fair and balanced testing, even if they did it still has to be taken with a pinch of salt. That’s where third party testing comes in.
To that end, we’ve been having discussions with Intel to procure access to a Xeon 9200 system, however Intel is only offering a Linux system – and we do the bulk of our testing under Windows. Intel's reasons for only providing us Linux system access (rather than Windows) are varied, but mostly revolve around the fact that the customers for which this product is aimed towards typically aren't Windows focused (as Windows isn't used with very high core count systems very often), which makes Intel rather hesitant to help with any non-Linux testing, as they want Xeon 9200 seen in the best possible light. Ultimately, we would love to have a best-vs-best shootout: 2x Xeon 9282 against 2x AMD EPYC 7H12, but as this issue isn’t likely to go away any time soon, we’ll need to refine our Linux test suite before we can do that.