AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer lookby Johan De Gelas on November 27, 2007 6:00 AM EST
- Posted in
- IT Computing
The new Xeon 45nm (or Xeon 54xx series) arrived in the lab weeks ago, so a server CPU benchmark update is overdue. However, there is another reason we decided to write this article. When AMD launched their newest quad-core chips, we could only give you a preview of its performance - not a full review. We did not fully understand many of the performance aspects of the new AMD architecture, so we decided to delve a little deeper.
We will only focus on performance in this article, as our primary goal is to get an idea where Barcelona (AMD's quad-core), Harpertown (Intel 45nm quad-core Xeon), and Clovertown (Intel quad-core 65nm Xeon) stand. To do so, we performed a minimal profiling of each of our benchmarks and we used several new micro benchmarks that will tell you a lot more than some real world benchmarks can. If you like understanding the benchmarks out there a bit better, dig in.
The new 45 Xeon
The new 45nm Xeon 54xx series, aka "Harpertown", is still based on the Core architecture, but it has been tweaked a bit. We have already discussed those improvements in detail here and here, so we won't discuss them in detail again, but here's a quick overview:
- Faster 4-bit divider (Radix-16) instead
of 2-bit divider
- Up to 1600MHz FSB
- Shared 6MB (24-way set associative) instead 4MB L2 cache (per dual-core
- Super Shuffle engine (For SSE instructions)
- Split Load Cache Enhancement
The Radix 16 divider is the most interesting of these improvements. Dividing involves a repetition of subtractions, tests "if-it-fits" and shifts. If you can do this with four bits at a time instead of two, this means that you can cut the number of these iterations required to get your result in half. The square root calculation is similar and also benefits from these improvements. While divisions and square roots are rather rare in common software, they have a very significant performance impact. Contrary to the more "popular" instructions, they are not pipelined and the latency of these instructions is high. For example, a floating-point multiply takes five cycles and the "Clovertown Core" architecture can finish one every two cycles thanks to pipelining. However, a floating-point division takes no less than 32 cycles and cannot be pipelined at all.
Nanotechnology is here: a core with 820 million transistors and you can fit at least two of them in one coin. (Photo by Tjerk Ameel)
Besides being an improved "Clovertown", the new Xeon is also a marvel of nanotechnology with no less than 410 million transistors on a die of only 107 mm². Two die make one quad-core Xeon 54xx. Considering that this CPU is close to behemoth CPUs like Itanium and Power 6 in SPECint performance, the new Intel is a formidable adversary for AMD's newest quad-core.
There is more than the CPU of course. The new CPU works on the old "Bensley/5000P chipset" platform, though we were not able to get it running on our P5000PSL Intel motherboard despite the fact that we applied the BIOS update that came out early this month.
There is also a new HPC/workstation platform for the Intel Xeon thanks to the Seaburg chipset, which features an improved snoop filter. Besides reducing the snoop traffic, the new chipset should also be able to extract more bandwidth out of the same FBDIMMs. The support for DDR2-800 FBDIMMs should bring another performance boost but our current test platform is only stable with DDR2-667 FBDIMMs.