The AMD 3rd Gen Ryzen Deep Dive Review: 3700X and 3900X Raising The Bar
by Andrei Frumusanu & Gavin Bonshor on July 7, 2019 9:00 AM EST** = Old results marked were performed with the original BIOS & boost behaviour as published on 7/7.
Benchmarking Performance: CPU System Tests
Our System Test section focuses significantly on real-world testing, user experience, with a slight nod to throughput. In this section we cover application loading time, image processing, simple scientific physics, emulation, neural simulation, optimized compute, and 3D model development, with a combination of readily available and custom software. For some of these tests, the bigger suites such as PCMark do cover them (we publish those values in our office section), although multiple perspectives is always beneficial. In all our tests we will explain in-depth what is being tested, and how we are testing.
All of our benchmark results can also be found in our benchmark engine, Bench.
Application Load: GIMP 2.10.4
One of the most important aspects about user experience and workflow is how fast does a system respond. A good test of this is to see how long it takes for an application to load. Most applications these days, when on an SSD, load fairly instantly, however some office tools require asset pre-loading before being available. Most operating systems employ caching as well, so when certain software is loaded repeatedly (web browser, office tools), then can be initialized much quicker.
In our last suite, we tested how long it took to load a large PDF in Adobe Acrobat. Unfortunately this test was a nightmare to program for, and didn’t transfer over to Win10 RS3 easily. In the meantime we discovered an application that can automate this test, and we put it up against GIMP, a popular free open-source online photo editing tool, and the major alternative to Adobe Photoshop. We set it to load a large 50MB design template, and perform the load 10 times with 10 seconds in-between each. Due to caching, the first 3-5 results are often slower than the rest, and time to cache can be inconsistent, we take the average of the last five results to show CPU processing on cached loading.
Application loading is typically single thread limited, but we see here that at some point it also becomes core-resource limited. Having access to more resources per thread in a non-HT environment helps the 8C/8T and 6C/6T processors get ahead of both of the 5.0 GHz parts in our testing.
3D Particle Movement v2.1: Brownian Motion
Our 3DPM test is a custom built benchmark designed to simulate six different particle movement algorithms of points in a 3D space. The algorithms were developed as part of my PhD., and while ultimately perform best on a GPU, provide a good idea on how instruction streams are interpreted by different microarchitectures.
A key part of the algorithms is the random number generation – we use relatively fast generation which ends up implementing dependency chains in the code. The upgrade over the naïve first version of this code solved for false sharing in the caches, a major bottleneck. We are also looking at AVX2 and AVX512 versions of this benchmark for future reviews.
For this test, we run a stock particle set over the six algorithms for 20 seconds apiece, with 10 second pauses, and report the total rate of particle movement, in millions of operations (movements) per second. We have a non-AVX version and an AVX version, with the latter implementing AVX512 and AVX2 where possible.
3DPM v2.1 can be downloaded from our server: 3DPMv2.1.rar (13.0 MB)
With a non-AVX code base, the 9900K shows the IPC and frequency improvements over the R7 2700X, although in reality it is not as big of a percentage jump as you might imagine. The processors without HT get pushed back a bit here.
Dolphin 5.0: Console Emulation
One of the popular requested tests in our suite is to do with console emulation. Being able to pick up a game from an older system and run it as expected depends on the overhead of the emulator: it takes a significantly more powerful x86 system to be able to accurately emulate an older non-x86 console, especially if code for that console was made to abuse certain physical bugs in the hardware.
For our test, we use the popular Dolphin emulation software, and run a compute project through it to determine how close to a standard console system our processors can emulate. In this test, a Nintendo Wii would take around 1050 seconds.
The latest version of Dolphin can be downloaded from https://dolphin-emu.org/
DigiCortex 1.20: Sea Slug Brain Simulation
This benchmark was originally designed for simulation and visualization of neuron and synapse activity, as is commonly found in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron / 1.8B synapse simulation, equivalent to a Sea Slug.
Example of a 2.1B neuron simulation
We report the results as the ability to simulate the data as a fraction of real-time, so anything above a ‘one’ is suitable for real-time work. Out of the two modes, a ‘non-firing’ mode which is DRAM heavy and a ‘firing’ mode which has CPU work, we choose the latter. Despite this, the benchmark is still affected by DRAM speed a fair amount.
DigiCortex can be downloaded from http://www.digicortex.net/
y-Cruncher v0.7.6: Microarchitecture Optimized Compute
I’ve known about y-Cruncher for a while, as a tool to help compute various mathematical constants, but it wasn’t until I began talking with its developer, Alex Yee, a researcher from NWU and now software optimization developer, that I realized that he has optimized the software like crazy to get the best performance. Naturally, any simulation that can take 20+ days can benefit from a 1% performance increase! Alex started y-cruncher as a high-school project, but it is now at a state where Alex is keeping it up to date to take advantage of the latest instruction sets before they are even made available in hardware.
For our test we run y-cruncher v0.7.6 through all the different optimized variants of the binary, single threaded and multi-threaded, including the AVX-512 optimized binaries. The test is to calculate 250m digits of Pi, and we use the single threaded and multi-threaded versions of this test.
Users can download y-cruncher from Alex’s website: http://www.numberworld.org/y-cruncher/
Agisoft Photoscan 1.3.3: 2D Image to 3D Model Conversion
One of the ISVs that we have worked with for a number of years is Agisoft, who develop software called PhotoScan that transforms a number of 2D images into a 3D model. This is an important tool in model development and archiving, and relies on a number of single threaded and multi-threaded algorithms to go from one side of the computation to the other.
In our test, we use version 1.3.3 of the software with a good sized data set of 84 x 18 megapixel photos, and push it through a reasonably fast variant of the algorithms. We report the total time to complete the process.
Agisoft’s Photoscan website can be found here: http://www.agisoft.com/
447 Comments
View All Comments
FireSnake - Sunday, July 7, 2019 - link
Awesome!I have been waiting for this one.
Let us start reading.
WaltC - Sunday, July 7, 2019 - link
One thing I noticed before I return to the reading is the odd bit about chipsets and memory speeds. Pretty sure the memory controller is on the CPU itself as opposed to the chipset, and I've been running DDR4-3200 XMP CL16 on my Ryzen 1 on both x370 and x470 MSI motherboards with no problems--the same DDR4 2x8 config moved from one motherboard to the next.futrtrubl - Sunday, July 7, 2019 - link
Guaranteed supported memory speeds and what overclocked memory can generally be used are two very separate things. And yes, that 3200 memory is considered an overclock for the CPU.WaltC - Sunday, July 7, 2019 - link
Right--so why tie the memory controller to the chipset? QUote: "Some motherboard vendors are advertising speeds of up to DDR4-4400 which until X570, was unheard of. X570 also marks a jump up to DDR4-3200 up from DDR4-2933 on X470, and DDR4-2667 on X370." Almost every x370, x470 motherboard produced will run DDR-4 3200 XMP ROOB. There's an obvious difference between exceeding JEDEC standards with XMP configurations and overclocking the cpu--which I've also done, but that's beside the point. Pointing out present JEDEC limitations overcome with XMP configurations is a far cry from understanding that the chipset doesn't control the memory speeds--the memory controller on the cpu is either capable of XMP settings or it isn't. Ryzen 1 is up to the task. You can also take a gander at vendor-specific motherboard ram compatibility lists to see lots of XMP 3200MHz compatibility with Ryzen 1 (and of course 2k and 3k series).edzieba - Sunday, July 7, 2019 - link
The new chipset means new boards, to which can be applied more stringent requirements of trace routing for DDR. Same as with the more stringent requirements for PCIe routing for PCIe 4.0.WaltC - Sunday, July 7, 2019 - link
OK--understood--but improved trace, imo, is mainly for PCIe4.x support with x570-- really not for DDR 3200 support, however, which has already been supported well in x370/x470 motherboards--which I know from practical experience....;) In my case it was as simple as activating the XMP profile #2 in the bios, saving the setting and rebooting. Simply was surprised to see someone tying the mem controller to the chipset! I know that the Ryzen mem controller in the CPU has been improved for Ryzen 3k series, but that has more to do with attaining much higher clocks > 3200MHz for the ram, and is relative to the CPU R 3k series, as opposed to the x570 chipset, since the mem controller isn't in the x570 chipset. All I wanted to say initially is that both DDR 4 3000 & 3200MHz have been supported all the way back to x370 boards, not by the chipset, but by the Ryzen memory controller--indeed, AMD released several AGESA versions for motherboard vendors to implement in their bioses to improve compatibility with with many different brands of memory, too.BikeDude - Sunday, July 7, 2019 - link
You mentioned 2x8GB. Try with 2x16GB and you might not be as lucky or will have to work harder to get the timing right. Motherboards that only seat two DIMMs will be noticeably easier than four DIMM motherboards.If AMD did anything to help grease the wheels, I'm sure many users will appreciate that.
FWIW, this overclocking guide has helped me a lot: https://www.techpowerup.com/review/amd-ryzen-memor...
mat9v - Sunday, July 7, 2019 - link
Does anyone know if 3900X has 3 cores for each CCX (as in 1 core in each CCX disabled) or does it have two CCX's of 4 cores and two CCX's of 2 cores?photonboy - Thursday, July 11, 2019 - link
3+3rarson - Monday, July 8, 2019 - link
WaltC, you're correct. The memory controller is part of the IO die, not the chipset. The chipset is connected to the IO die via 4 PCIe lanes.While the subsequent iterations of Ryzen have indeed improved memory support along with the new chipsets, the chipsets have nothing to do with that. I'm assuming the author is using the chipsets to delineate generations of memory improvement, but it could be just as easily (and more clearly) stated by referring to the generation of Ryzen processors.