AMD's Ryzen 9 6900HS Rembrandt Benchmarked: Zen3+ Power and Performance Scalingby Dr. Ian Cutress on March 1, 2022 9:30 AM EST
Core-to-Core, Cache Latency, Ramp
For some of our standard tests, we look at how the CPU performs in a series of synthetic workloads to example any microarchitectural changes or differences. This includes our core-to-core latency test, a cache latency sweep across the memory space, and a ramp test to see how quick a system runs from idle to load.
Inside the chip are eight cores connected through a bi-directional ring, each direction capable of transmitting 32 bytes per cycle. In this test we test how long it takes to probe an L3 cache line from a different core on the chip and return the result.
For two threads on the same core, we’re seeing a 7 nanosecond difference, whereas for two separate cores we’re seeing a latency from 15.5 nanoseconds up to 21.2 nanoseconds, which is a wide gap. Finding out exactly how much each jump takes is a bit tricky, as the overall time is reliant on the frequency of the core, of the cache, and of the fabric over the time of the test. It also doesn’t tell us if there is anything else on the ring aside from the cores, as there is also going to be some form of external connectivity to other elements of the SoC.
However, compared to the Zen3 numbers we saw on the Ryzen 9 5980HS, they are practically the same.
Cache Latency Ramp
This test showcases the access latency at all the points in the cache hierarchy for a single core. We start at 2 KiB, and probe the latency all the way through to 256 MB, which for most CPUs sits inside the DRAM.
Part of this test helps us understand the range of latencies for accessing a given level of cache, but also the transition between the cache levels gives insight into how different parts of the cache microarchitecture work, such as TLBs. As CPU microarchitects look at interesting and novel ways to design caches upon caches inside caches, this basic test proves to be very valuable.
The data here again mirrors exactly what we saw with the previous generation on Zen3.
Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high-powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.
Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.
One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.
We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.
We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.
A ramp time of within one millisecond is as expected for modern AMD platforms, although we didn’t see the high 4.9 GHz that AMD has listed this processor as being able to obtain. We saw it hit that frequency in a number of tests, but not this one. AMD’s previous generation took a couple of milliseconds to hit around the 4.0 GHz mark, but then another 16 milliseconds to go full speed. We didn’t see it in this test, perhaps due to some of the new measurements AMD is doing on core workload and power. We will have to try this on a different AMD Ryzen 6000 Mobile system to see if we get the same result.