
Original Link: https://www.anandtech.com/show/3470
LINPACK: Intel's Nehalem versus AMD Shanghai
by Johan De Gelas on November 28, 2008 12:00 AM EST- Posted in
- IT Computing general
A "beta BIOS update" broke compatibility with ESX, so we had to postpone our virtualization testing on our quad CPU AMD 8384 System.
So we started an in depth comparison of the 45 nm Opterons, Xeons and Core i7 CPUs. One of our benchmarks, the famous LINPACK (you can read all about it here) painted a pretty interesting performance picture.
We had to test with a matrix size of 18000 (2.5 GB of RAM necessary), as we only had 3 GB of DDR-3 on the Core i7 platform. That should not be a huge problem as we tested with only one CPU. We normally need about 4 GB for each quadcore CPU to reach the best performance.
We also used the 9.1 version of Intel's LINPACK, as we wanted the same binary on both platforms. As we have show before, this version of LINPACK performs best on both AMD and Intel platforms when the matrix size is low. The current 10.1 version does not work on AMD CPUs unfortunately.
We don't pretend that the comparison is completely fair: the Nehalem platform uses unbuffered RAM which has slightly lower latency and higher bandwidth than the Xeon "Nehalem" will get. But we had to satisfy our curiousity: how does the new "Shanghai" core compare to "Nehalem"?

Quite interesting, don't you think? Hyperthreading (SMT) gives the Nehalem core a significant advantage in most multi-threaded applications, but not in Linpack: it slows the CPU down by 10%. May we have found the first multi-threaded application that is slowed down by Hyperthreading on Nehalem? That should not spoil the fun for Intel though, as many other HPC benchmarks show a larger gap. AMD has the advantage of being first to the market, Nehalem based Xeons are still a few months away.
Also, the impact of the memory subsystem is limited, as a 50% increase in memory speed results in a meager 6% performance increase. The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized. This in great contrast to other HPC applications where the tripple channel DDR-3 memory system of Nehalem really pays off. More later...