AMD Releases Milan-X CPUs With 3D V-Cache: EPYC 7003 Up to 64 Cores and 768 MB L3 Cacheby Gavin Bonshor on March 21, 2022 9:00 AM EST
There's been a lot of focus on how both Intel and AMD are planning for the future in packaging their dies to increase overall performance and mitigate higher manufacturing costs. For AMD, that next step has been V-cache, an additional L3 cache (SRAM) chiplet that's designed to be 3D die stacked on top of an existing Zen 3 chiplet, tripling the total about of L3 cache available. Today, AMD's V-cache technology is finally available to the wider market, as AMD is announcing that their EPYC 7003X "Milan-X" server CPUs have now reached general availability.
As first announced late last year, AMD is bringing its 3D V-Cache technology to the enterprise market through Milan-X, an advanced variant of its current-generation 3rd Gen Milan-based EPYC 7003 processors. AMD is launching four new processors ranging from 16-cores to 64-cores, all of them with Zen 3 cores and 768 MB L3 cache via 3D stacked V-Cache.
AMD's Milan-X processors are an upgraded version of its current 3rd generation Milan-based processors, EPYC 7003. Adding to its preexisting Milan-based EPYC 7003 line-up, which we reviewed back in June last year, the most significant advancement from Milan-X is through its large 768 MB of L3 cache using AMD's 3D V-Cache stacking technology. The AMD 3D V-Cache uses TSMC's N7 process node – the same node Milan's Zen 3 chiplets are built upon – and it measures at 36 mm², with a 64 MiB chip on top of the existing 32 MiB found on the Zen 3 chiplets.
Focusing on the key specifications and technologies, the latest Milan-X AMD EPYC 7003-X processors have 128 available PCIe 4.0 lanes that can be utilized through full-length PCIe 4.0 slots and controllers selection. This is dependent on how motherboard and server vendors want to use them. There are also four memory controllers that are capable of supporting two DIMMs per controller which allows the use of eight-channel DDR4 memory.
The overall chip configuration for Milan-X is a giant, nine chiplet MCM, with eight CCD dies and a large I/O die, and this goes for all of the Milan-X SKUs. Critically, AMD has opted to equip all of their new V-cache EPYC chips with the maximum 768 MB of L3 cache, which in turn means all 8 CCDs must be present, from the top SKU (EPYC 7773X) to the bottom SKU (EPYC 7373X). Instead, AMD will be varying the number of CPU cores enabled in each CCD. Drilling down, each CCD includes 32 MB of L3 cache, with a further 64 MB of 3D V-Cache layered on top for a total of 96 MB of L3 cache per CCD (8 x 96 = 768).
In terms of memory compatibility, nothing has changed from the previous Milan chips. Each EPYC 7003-X chip supports eight DDR4-3200 memory modules per socket, with capacities of up to 4 TB per chip and 8 TB across a 2P system. It's worth noting that the new Milan-X EPYC 7003-X chips share the same SP3 socket as the existing line-up and, as such, are compatible with current LGA 4094 motherboards through a firmware update.
|AMD EPYC 7003 Milan/Milan-X Processors|
|EYPC 7773X||64||128||2200||3500||768 MB||128 x 4.0||8 x DDR4-3200||280||$8800|
|EPYC 7763||64||128||2450||3400||256 MB||128 x 4.0||8 x DDR4-3200||280||$7890|
|EPYC 7573X||32||64||2800||3600||768 MB||128 x 4.0||8 x DDR4-3200||280||$5590|
|EPYC 75F3||32||64||2950||4000||256 MB||128 x 4.0||8 x DDR4-3200||280||$4860|
|EPYC 7473X||24||48||2800||3700||768 MB||128 x 4.0||8 x DDR4-3200||240||$3900|
|EPYC 74F3||24||48||3200||4000||256 MB||128 x 4.0||8 x DDR4-3200||240||$2900|
|EPYC 7373X||16||32||3050||3800||768 MB||128 x 4.0||8 x DDR4-3200||240||$4185|
|EPYC 73F3||16||32||3500||4000||256 MB||128 x 4.0||8 x DDR4-3200||240||$3521|
Looking at the new EPYC 7003 stack with 3D V-Cache technology, the top SKU is the EPYC 7773X. It features 64 Zen3 cores with 128 threads has a base frequency of 2.2 GHz and a maximum boost frequency of 3.5 GHz. The EPYC 7573X has 32-cores and 64 threads, with a higher base frequency of 2.8 GHz and a boost frequency of up to 3.6 GHz. Both the EPYC 7773X and 7573X have a base TDP of 280 W, although AMD specifies that all four EPYC 7003-X chips have a configurable TDP of between 225 and 280 W.
The lowest spec chip in the new line-up is the EPYC 7373X, which has 16 cores with 32 threads, a base frequency of 3.05 GHz, and a boost frequency of 3.8 GHz. Moving up the stack, it also has a 24c/48t option with a base frequency of 2.8 GHz and a boost frequency of up to 3.7 GHz. Both include a TDP of 240 W, but like the bigger parts, AMD has confirmed that both 16-core and 24-core models will have a configurable TDP of between 225 W and 280 W.
Notable, all of these new Milan-X chips have some kind of clockspeed regression over their regular Milan (max core performance) counterparts. In the case of the 7773X, this is the base clockspeed, while the other SKUs all drop a bit on both base and boost clockspeeds. The drop is necessitated by the V-cache, which at about 26 billion extra transistors for a full Milan-X configuration, eats into the chips' power budget. So with AMD opting to keep TDPs consistent, clockspeeds have been dialed down a bit to compensate. As always, AMD's CPUs will run as fast as heat and TDP headroom allows, but the V-cache equipped chips are going to reach those limits a bit sooner.
AMD's target market for the new Milan-X chips is customers who need to maximize per-core performance; specifically, the subset of workloads that benefit from the extra cache. This is why the Milan-X chips aren't replacing the EPYC 70F3 chips entirely, as not all workloads are going to respond to the extra cache. So both lineups will be sharing the top spot as AMD's fastest-per-core EPYC SKUs.
For their part, AMD is particularly pitching the new chips at the CAD/CAM market, for tasks such as finite element analysis and electronic design automation. According to the company, they've seen upwards of a 66% increase in RTL verification speeds on Synopsys' VCS verification software in an apples-to-apples comparison between Milan processors with and without V-cache. As with other chips that incorporate larger caches, the greatest benefits are going to be found in workloads that spill out of contemporary-sized caches, but will neatly fit into the larger cache. Minimizing expensive trips to main memory means that the CPU cores can remain working that much more often.
Microsoft found something similar last year, when they unveiled a public preview of its Azure HBv3 virtual machines back in November. At the time, the company published some performance figures from its in-house testing, mainly on workloads associated with HPC. Comparing Milan-X directly to Milan, Microsoft used data from both EPYC 7003 and EPYC 7003-X inside its HBv3 VM platforms. It's also worth noting that the testing was done on dual-socket systems, as all of the EPYC 7003-X processors announced today could be used in both 1P and 2P deployments.
Performance data published by Microsoft Azure is encouraging and using its in-house testing, it looks as though the extra L3 cache is playing a big part. In Computational Fluid Dynamics, it was noted that there was a better speed up with fewer elements, so that has to be taken into consideration. Microsoft stated that with its current HBv3 series, its customers can expect maximum gains of up to 80% performance in Computational Fluid Dynamics compared to the previous HBv3 VM systems with Milan.
Wrapping things up, AMD's EPYC 7003-X processors are now generally available to the public. With prices listed on a 1K unit order basis, AMD says the EPYC 7773X with 64C/128T will be available for around $8800, while the 32C/64T model, the EPYC 7573X, will cost about $5590. Moving down, the EPYC 7473X with 24C/48T will cost $3900, and the entry EPYC 7373X with 16C/32T will cost slightly more with a cost of $4185.
Given the large order sizes required, the overall retail price is likely to be slightly higher for one unit. Though with the majority of AMD's customers being server and cloud providers, no doubt AMD will have some customers buying in bulk. Many of AMD's major server OEM partners are also slated to begin offering systems using the new chips, including Dell, Supermicro, Lenovo, and HPE.
Finally, consumers will get their own chance to get their hands on some AMD V-cache enabled CPUs next month, when AMD's second V-cache product, the Ryzen 7 5800X3D, is released. The desktop processor is based around a single CCD with a whopping 96 MB of L3 cache available, all of which contrasts nicely with the much bigger EPYC chips.
Post Your CommentPlease log in or sign up to comment.
View All Comments
ddhelmet - Monday, March 21, 2022 - linkEPYC 74F3 24 78
typo i think
Ryan Smith - Monday, March 21, 2022 - linkThanks!
vistad - Monday, May 30, 2022 - linkhttp://hobedarvish.com/
cchi - Monday, March 21, 2022 - linkDoes anyone know how they managed to add 64MB instead of 32MB? The size of the cache die seem the same as the cache portion of the CCD.
Is AMD using some kind of dual layer process, similar to nand flash layers?
Or perhaps using some kind of denser but slower cache implementation?
nandnandnand - Monday, March 21, 2022 - linkIt's two layers, check the analysis here:
TSMC also demonstrated 12 inactive layers. It's possible that 2 layers is nowhere near the limit, but it's what yields allow for now. You would hope that 12 layers would be less than 6x the cost of 2 layers in the future.
Samsung also announced "X-Cube" 3D SRAM back in 2020, with an unknown amount of layers:
cchi - Monday, March 21, 2022 - linkThank you for the links, they were very helpful, although perhaps in the reverse way. It actually is confirmed in there that it is a single but denser layer.
Completely at the end of the article in the first link:
"The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market."
It would be interesting to investigate what effect this has on the latency. I am sure Anandtech is already on top of it =).
nandnandnand - Monday, March 21, 2022 - linkOops. I don't even want to count how many times I've made that mistake.
Thunder 57 - Monday, March 21, 2022 - linkFrom Tomshardware:
"As a result, the L3 chiplet provides the same 2 TB/s of peak throughput as the on-die L3 cache, but it only comes with a four-cycle latency penalty."
back2future - Monday, March 21, 2022 - linktaking above numbers:
128 threads are about $60 each thread (for comparing to cores with only 1 thread, 64threads are ~$140 each core)
512MB 3D SRAM is about $1.7/MB on 64 core EPYC 7773X (or 7763) and lower $1.4/MB for EPYC 7573X (or 75F3), 32 core, around $100/core for 24 core 74x3(X) (~$1.9/MB) and some $195/core (but each class highest Base/1T freq on lower 240W TDP) for 73x3(X) 16 core EPYC (with ~$1.3/MB),
would sum to ~$1.57/MB for EPYC 3D SRAM on average?
back2future - Monday, March 21, 2022 - linkif 16 threads are worth 40W TDP budget, and 512MB 3D V-cache is a Base freq difference of ~250-300MHz and 16 threads are worth a TDP budget comparable to ~250MHz Base freq (32 cores ~500-600MHz) on dominance of conventional L3 cache (256MB types) compared to additional 64MB V-cache, then 512MB 3D V-cache is worth roughly 40(to <80)W TDP on highest sRAM load also?
Difficult for 12layers on that production node?