The Intel Second Generation Xeon Scalable: Cascade Lake, Now with Up To 56-Cores and Optane!by Ian Cutress on April 2, 2019 1:02 PM EST
The cadence of Intel’s enterprise processor portfolio is designed to support customers that use the hardware with a guarantee of socket and platform support for at least three years. As a result, we typically get two lots of processors per socket: Sandy Bridge and Ivy Bridge, Broadwell and Haswell, and now Cascade Lake joins Skylake. Intel’s new Second Generation Xeon Scalable (the official name) still comes in the new ‘Platinum / Gold / Silver / Bronze’ nomenclature, but this time offering up to 56 cores if you want the processor equivalent of Wolverine at your disposal. Not only is Intel offering more cores, but there’s Optane support, faster DRAM, new configurations, and better specialization that before. Intel also surprised us with better-than-expected hardware support for Spectre and Meltdown mitigations while still providing higher performance overall.
Processor Evolution: Growing Eyes and Limbs
Updating a processor portfolio is a multi-faceted issue. There are the obvious improvements that a company can aim for: more performance, better efficiency, lower power. Then there are the not so obvious improvements that might be customer specific: support for new instructions, layered ecosystem optimizations, support for new technology, or a new product direction. The focus on Intel’s new Second Generation Enterprise Xeon Scalable processors, known as Cascade Lake, is ever so much on the latter.
Cascade Lake builds on a foundation of Skylake by enhancing those secondary characteristics that very often take a back seat to a standard product announcement. The cynic on the room might suggest that as the microarchitecture is the same as the previous generation, there is no improvement, but Intel has enhanced its offering by focusing product implementations, developed special features for emerging markets, enhanced security, and, for those adventurous enough, put two high-performance processors into a single package. By enhancing the periphery of the product and the ecosystem, a new generation is born.
What does this mean in reality? What characteristics have changed? Here’s a bullet point list which we will go into more detail.
- Most of the mid-range processors have more cores for the same price
- Frequency has increased in almost all processors
- L3 has increased in most mid-range processors
- Faster DDR4 is supported
- More DRAM is supported across the stack
- Optane DRAM is supported on almost all Gold/Platinum SKUs
- New CPU configurations optimized for specific markets
- Speed Select Technology for Cloud Deployments
- New ‘Cascade Lake-AP’ Platinum 9200 family, up to 56 cores and 400W per socket
- Enhanced Spectre and Meltdown Mitigations
- New AVX-512 VNNI instructions for Emerging Workloads
Each of these has its own story, and showcases that building a processor family has to have an important ecosystem backing it in order to succeed. Speaking with Intel, and with AMD and Arm, the enterprise user is very different to ten years ago – they want customized and optimized products specifically for them, not so much the off-the-shelf generic part. They’re willing to invest more into something tailored for their market, or if they want something off-roadmap, unique to them. All the companies in this space are looking to their customers, either to provide them the product they want, or to build out the ecosystem they need as their own workloads change.
Cascade Lake: Naming and Upgrades!
With the silicon that enterprise users will actually want to buy, Intel is still keeping its Platinum, Gold, Silver, and Bronze designations for the second generation. What is new to the stack is the Platinum 9200 family, based on the higher core count BGA models for high-density deployments, as well as a range of new suffixes and letters to help identify some of the differentiated product.
The letters are as follows:
- No letter = Normal Memory Support (1.0 TB)
- M = Medium Memory Support (2.0 TB)
- L = Large Memory Support (4.5 TB)
- Y = Speed Select Models (see below)
- N = Networking/NFV Specialized
- V = Virtual Machine Density Value Optimized
- T = Long Life Cycle / Thermal
- S = Search Optimized
Users familiar with the letters on the first generation Xeon Scalable will notice one missing: F. F was Intel’s letter for SKUs with added OmniPath Fabric on package. Intel stated it will not be making fabric processors for this new generation, which perhaps goes to show you how popular it was.
Out of the new letter configurations, users will notice that the no-letter designation now has support for 1.0 TB of memory, double the first generation. This goes up to 2 TB for M, and 4.5 TB for L. These values include parts that are fitted with Optane, and as such an ‘L’ CPU can support 3.0 TB of Optane plus 1.5 TB of DDR4 memory, to give that 4.5 TB total. Out of the M/L designation, M seems odd at 2.0 TB, especially for a processor that has six channels of memory. This is because there are a number of motherboards in the market in a ‘2+1+1’ configuration, which relates to how many DIMMs are on each side of the processor. ‘2+1+1’ means that on each side, there is one memory channel having two DIMMs, one with one DIMM, and another with one DIMM. This makes eight per socket, allowing for 2.0 TB divided by eight to equal 256 GB per DIMM slot. Almost confusing.
Out of the other letters, these mostly refer to optimizations for each processor relating to adjustments in core count, frequency, power, manufacturing (for the high thermal parts), and price. The odd one out here is the ‘Y’ Speed Select model, which relates to a new Intel feature we’ll discuss below.
Overall, the changes in the processors between the first and second generation are as follows:
|Intel Xeon Scalable|
|April 2019||Released||July 2017|
| Up to 28
 Up to 56
|Cores|| Up to 28|
|1 MB L2 per core
Up to 38.5 MB Shared L3
|Cache||1 MB L2 per core
Up to 38.5 MB Shared L3
|Up to 48 Lanes||PCIe 3.0||Up to 48 Lanes|
Up to DDR4-2933
1.5 TB Standard
|DRAM Support||Six Channels
Up to DDR4-2666
768 GB Standard
|Up to 4.5 TB Per Processor||Optane Support||-|
|AVX-512 VNNI with INT8||Vector Compute||AVX-512|
|Variant 2, 3, 3a, 4,
| Up to 205 W
 Up to 400 W
|TDP||Up to 205 W|
Cascade Lake has support for Optane, support for DDR4-2933 (at two DIMMs per channel), better Spectre/Meltdown support, new VNNI instructions, and the new 9200 family of processors. We’ll go through the exact SKUs of the processors now, starting with the new 9200 family.
Cascade Lake-AP: Intel Xeon Platinum 9200 Family
Intel previously announced its intention to build a new high-performance, high-density compute platform. We covered the initial announcement back in November last year, which showed that Intel was placing two of its large enterprise core silicon dies into a silicon package, to create something with double the cores and double the memory in something smaller than a dual socket system. This product, according to Intel, was designed to be purely focused on that high-density compute market, where space is at a premium. The company stated that where HPC previously had two processors, they could now have the equivalent of four.
So Intel now has a name for this family of products: the Xeon Platinum 9200 family. Intel will be offering three parts, and where previously the company stated it would offer up to 48 cores in a single package, it will now offer up to 56, with the top TDP up to 400W. These processors are BGA only, and will only be sold as fundamental Intel server designs by Intel through the OEMs. The OEMs can optimize the design as they see fit, such as offering four blades in a 2U design or going between air and liquid cooling, however the motherboard/CPU configurations will be fixed by Intel.
|Intel Xeon Platinum 9200 Family
(Cascade Lake AP)
|Platinum 9282||56 C / 112 T||2.6 GHz||3.8 GHz||77.0 MB||400 W||arm|
|Platinum 9242||48 C / 96 T||2.3 GHz||3.8 GHz||71.5 MB||350 W||leg|
|Platinum 9222||32 C / 64 T||2.3 GHz||3.7 GHz||71.5 MB||250 W||foot|
|Platinum 9221||32 C / 64 T||2.1 GHz||3.7 GHz||71.5 MB||250 W||kidney|
Each CPU will have 40 lanes of PCIe 3.0, and support for twelve memory channels up to DDR4-2933. There is no Optane support, as Intel believes that this sort of high-density compute is aimed at installations that are compute bound, not memory bound. There is a reason to believe that some installations won’t even have all the slots for the memory channels, or they will only be one DIMM per channel.
Given the UPI links layout in that diagram, what Intel is essentially giving its customers is a way to make a super dense equivalent of a four-socket system. On its own with a single socket, the processor will act similar to a poor communication based dual socket system, so ultimately we see the value in the 9200 family in dual socket designs.
Intel will not release official pricing for these processors, as they are BGA only and sold to OEMs as systems, not individual parts. We’re trying to get one in to review.
Intel Platinum 8200, Gold 6200, Gold 5100, Silver 4200, and Bronze 3200
The other processors in Intel’s stack follow the nomenclature, and vary in core count (up to 28), frequency (up to 4.0 GHz), TDP (up to 205W), and L3 cache (up to 38.5 MB). The raw processor list is as follows:
|Intel Second Generation Xeon Scalable Family
|Xeon Platinum 8200|
|Xeon Gold 6200|
|Xeon Gold 5200|
|Xeon Silver 4200|
|Xeon Bronze 3200|
It might be difficult to separate why any processor is better than the other; however one of the prime examples of how Intel has redesigned the stack comes with the Xeon Gold 6200 family, which is one of Intel’s most popular price points.
At the same MSRP, Intel is offering one of a few things at the same price: either a higher frequency, or more cores. As a result, we’re going to see a lot of cases where Intel and its customers will claim to offer ’25-40% more performance!’ – this is going to be true in terms of performance per dollar, but the per-core performance for a processor with that core count is only going to be boosted by the frequency adjustment. Intel and its partners are not going to stop promoting that they offer a better performance per dollar proposition, although it won’t always be immediately obvious that the raw per-clock performance hasn’t changed.
Intel’s Optane DC Persistent Memory: Actual DRAM, or RAMDisk?
The one thing that will forever give me nightmares is (insert echo microphone) ‘The Pyramid of Optane!’. This is the diagram Intel brings out at every event where they have to speak about optane. I now dream of this pyramid.
Intel’s new Optane technology is divided into two segments, bridging the gap between DRAM and storage. In the storage product, we’re talking about a competitor to 3D NAND, but at low capacity, high-performance storage layer that uses NVMe. In the memory product, this is in the DDR4 form factor and offers a high-capacity slightly slower alternative to DRAM. With Intel’s new Cascade Lake processors, it now supports the new memory DDR4-compatible version of Optane.
Intel promotes Optane DC Persistent Memory (the official name) in two ways.
First, as a way to add much more memory to the system at a slightly slower speed. With the high-memory compatible processors, users can add up to 3.0 TB of Optane DCPMM to 1.5 TB of DDR4, to give 4.5 TB of total memory per socket. In this configuration, the DDR4 acts as a buffer/store in front of the Optane, which hides some of the increased latency, but ultimately it means that larger in-memory databases can be used per socket.
Secondly, as a persistent memory store. The number that Intel likes to quote here is when a system restarts and needs to reload all of its data from NVMe into main memory. With persistent memory, the data is already there, like a storage device, and as a result it can reduce downtime from loading the database from forty minutes down to four, which for a warm reboot reduces the downtime quite considerably.
For the enterprise customer actually using Optane, it will be visible to the system in two different ways: Memory Mode or App Direct mode. In memory mode, the Optane just molds into the DDR4 memory, so it looks like there’s a lot more DRAM in the system. In this mode, it is very easy to see the benefits of just having more memory. In App Direct mode, the Optane is actually mounted like a storage device, but with the performance of a RAM Disk. In order to take advantage of this mode, the software may have to be redesigned in order to use this extra storage like device.
In speaking with OEMs, they expect customers that need large database storage to be the first to acquire Optane, as the larger memory footprint is easier to amortize effectively. The RAM-Disk ‘App Direct’ mode is going to take longer to realize.
Intel will offer Optane in 128 GB, 256 GB, and 512 GB modules, and the way that it works means that up to six Optane modules can be installed per socket (with at least one DDR4 module as well). Intel has stated that it will not be releasing pricing for Optane at this time, and in speaking to OEMs, their Optane based solutions should be on general availability sometime in June.
Cascade Lake: More Than Just Moore (Of The Same)
If I were to be blunt, then I’d say that at the super high level, the new Cascade Lake processor family is essentially a refresh of the previous generation Skylake Xeon Scalable processors. It’s the same core microarchitecture underneath, with some memory increases or core count adjustment, and a cynic might point out that raw CPU performance per clock doesn’t change. If Intel were launching just a processor, I might agree with you, but these enterprise launches are now so more about what supports the processor than the raw performance itself. Capabilities, such as Optane, along with added security and new elements to the product portfolio, push Cascade Lake above ‘just another launch’.
One of the big questions will be if Intel can compete with 28 cores on 14nm when AMD is ready to roll out 64 cores on 7nm, and how the performance will differ. One of the clever things Intel has done in this contest is to draw the talk away from just quoting core counts, and help build a platform around its product. AMD is smart, they’ll be doing exactly the same, and they’re getting ready to launch 64 cores later this year. It will make for an interesting head to head battle.
For users expecting to see a full launch review of Intel’s Cascade Lake today, unfortunately, the time between sampling and the launch was too tight to bring something together. Johan and I are currently working on several ideas, along with testing Optane, in order to bring you a full review when it is ready. This is likely to take at least a couple of weeks, but we have some exciting content ahead. Stay tuned!