AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer lookby Johan De Gelas on November 27, 2007 6:00 AM EST
- Posted in
- IT Computing
The Opteron 2360SE - the Facts
Getting back to AMD, the new quad-core chip still lives under veil of secrecy. Quite a few rumors and myths are going around and we investigated them one by one so we could be sure that you only get the facts.
Fact 1: The B2 stepping does not have a much faster memory controller than the B1 stepping
The controller found in stepping "2" might be a tiny bit faster, but we have not found any significant difference. Our Stream benchmarks were only a tiny bit faster on the 2.5GHz (Stepping 2) than on the 2GHz (Stepping DR-B1) and so were the latency numbers.
The 2.5GHz Barcelona is a newer stepping than the 2GHz sample we tested earlier.
Fact 2: The 2.5GHz review sample is running at 1.2V; it is not overclocked
CPU-Z reported that the chip was running at 1.5V, while the 2GHz quad-core was running at 1.2V.
Power measurements show that the BIOS of our ASUS board is accurate, but CPU-Z is not. The last evidence is of course the laser marking on the quad-core Opteron.
AMD is capable of producing 2.5GHz quad-core, but not in large quantities at this time. The 2.5GHz part should arrive at the end of this year, with large quantities expected in the first quarter of 2008.
Fact 3: the memory controller always runs RAM at the rated frequency
In this case, the new quad-core Opteron is completely different from what we have seen with previous Opteron (and Athlon 64) processors. In the first and second generation Opteron, the memory controller ran at a divisor of the CPU. This resulted in very odd memory clocks speeds at times, particularly on odd multipliers. For example, a 2.2GHz Opteron (11X multiplier) uses a divisor of 7 and ends up running DDR2-667 (333.5MHz clock) at 314MHz. That gives DDR2-628 instead of 667. In reality, this doesn't have any major performance impact, and it is only measurable with "Stream-like" benchmarks. In contrast, Barcelona's memory controller runs at its own frequency and will run the DIMMs at the rated speed.
Fact 4: By running the Northbridge at a lower speed, the new quad-core loses a bit of performance but saves power
The core of the Opteron 2360 runs at 2.5GHz, but the L3 cache runs at Northbridge frequency, which is 2GHz. It seems that AMD's engineers felt that running the L3 at core frequency would not have resulted in significantly higher performance, but significantly higher power dissipation. From another point of view, given a certain power envelope, running the Northbridge and the L3 cache at higher frequencies would result in lower core frequencies.
Fact 5: The L3 cache was a good choice, but…
The L3 cache does increase latency of accessing the main memory but decreases the average latency seen by the CPU. This leads to the question of whether the relatively slow L3 cache is really an advantage. The L3 cache has a latency of 43 cycles (2GHz) or 48 cycles (2.5GHz), but it's still quite a bit faster than system memory, which takes about 130 to 170 cycles to access.
In addition, it has one main advantage for server workloads. If more than one core is accessing a cacheline in the L3 cache, it will remain in the L3. If not, the L3 cache will behave like a fully exclusive cache: it will send the cacheline to the L1 and throw out the cacheline to make place for a "victim" of the L2. This allows relatively fast sharing of data between threads, which is important for large code footprint applications like database applications and others. For single-threaded applications, it looks like they get a 2.5MB L2 cache, although with an average latency of about 20 cycles.
Still, there is no doubt that the L3 cache of Barcelona could have been a bit bigger to score even better in the larger database benchmarks such as TPC. We have to guess that a larger L3 cache became a victim (pun intended) of the already large 283 mm² die size. Still, a 44 cycle latency (and more) is rather disappointing for only 2MB of L3 cache.
Fact 6: Dual-Link is possible with AMD 2xxx Opterons
Several readers asked us how it was possible that our ASUS KFSN4-DRE board linked our 2350 CPUs with two instead of one HyperTransport point-to-point connection, as the 23xx Opteron supports only one coherent HyperTransport link. However, the constraint is not the number of links but actually the number of coherent responses that are supported. Our ASUS board does feature twice as much bandwidth for CPU-to-CPU traffic (snoop, access to remote memory etc.)