The upcoming Intel Nehalem CPU has been in the spotlight for months now. In contrast and despite the huge die size and 1.9 billion (!) transistors, the 6-core Xeon 74xx is a wallflower for both the public as Intel's marketing. However, if you've invested in the current Intel platform, the newly launched Intel 74xx series deserves a lot more attention.

The Xeon 74xx, formerly known as Dunnington, is indeed a very interesting upgrade path for the older quad socket platform. All Xeon 74xx use the same mPGA604 socket as previous Xeons and are electrically compatible with the Xeon 73xx series. The Xeon 73xx , also known as Tigerton, was basically the quad-core version of the Xeon 53xx (Clovertown) that launched at the end 2006. The new hex-core Dunnington combines six of the latest 45nm Xeon Penryn cores on a single die. As you may remember from our dual socket 45nm Xeon 54xx review, the 45nm Penryn core is about 10% to 20% faster than its older 65nm brother (Merom). There is more: an enormous 12MB to 16MB L3 cache ensures that those six cores access high latency main memory a lot less. This huge L3 also reduces the amount of "cache syncing" traffic between the CPUs, an important bottleneck for the current Intel server platforms.

2.66GHz, 6 cores, 3x3MB L2, and 16MB L3 cache: a massive new Intel CPU

With at least 10% to 20% better performance per core, two extra cores per CPU package, and an upgrade that only requires a BIOS update, the newest Xeon 7460 should be an attractive proposal if you are short on processing power.

Six Cores?

Dunnington was announced at the past IDFs as "extending the MP leadership". Readers who read our last quad socket report understand that this is a questionable claim. Since AMD introduced the Opteron 8xxx in April 2003, there has never been a moment that Intel was able to lead the dance in the quad socket server market. Sure, the Intel 73xx was able to outperform the AMD chip in some areas (rendering), but the AMD quad-core was still able to keep up with Intel chip in Java, ERP, and database performance. When it comes to HPC, the AMD chip was clearly in the lead.

Dunnington might not be the darling of Intel marketing, but the chip itself is a very aggressive statement: let us "Bulldoze" AMD out of the quad socket market with a truly gigantic chip that only Intel can produce without losing money. Intel is probably - courtesy of the impressive ultra low leakage 45nm high-K process technology - the only one capable of producing large quantities of CPUs containing 1.9 billion transistors, resulting in an enormous die size of 503 mm2. That is almost twice the size of AMD's upcoming 45nm quad-core CPU Shanghai. Even IBM's flagship POWER6 processor (up to 4.7GHz) is only 341 mm2 and only has 790 million transistors.

Processor Size and Technology Comparison
CPU transistors count (million) Process Die Size Cores
Intel Dunnington 1900 45 nm 503 6
Intel Nehalem 731 45 nm 265 4
AMD Shanghai 705 45 nm 263 4
AMD Barcelona 463 65 nm 283 4
Intel Tigerton 2 x 291 = 582 65 nm 2 x 143 = 286 4
Intel Harpertown 2 x 410 = 820 45 nm 2 x 107 = 214 4

The huge, somewhat irregular die - notice how the two cores in the top right corner are further away from the L3 cache than the other four - raises some questions. Such an irregular die could introduce extra wire delays, reducing the clock speed somewhat. Why did Intel not choose to go for an 8-core design? The basic explanation that Patrick Gelsinger, General Manager of Intel's Digital Enterprise Group, gave was that simulations showed that a 6-core with 16MB L3 outperformed 8-core with a smaller L3 in the applications that matter the most in the 4S/8S socket space.

Layout of the new hex-core

TDP was probably the most important constraint that determined the choice of six cores, since core logic consumes a lot more power than cache. An 8-core design would make it necessary to reduce the clock speed too much. Even at 65nm, Intel was already capable of producing caches that needed less than 1W/MB, so we can assume that the 16MB cache consumes around 16W or less. That leaves more than 100W for the six cores, which allows decent clock speeds at very acceptable TDPs as you can see in the table below.

Processor Speed and Cache Comparison
Xeon model Speed (GHz) Cores L2 Cache (MB) L3 Cache (MB) TDP (W)
X7460 2.66 6 3x3 16 130
E7450 2.4 6 3x3 12 90
X7350 2.93 4 2x4 0 130
E7440 2.4 4 2x3 12 90
E7340 2.4 4 2x4 0 80
E7330 2.4 4 2x4 0 80
E7430 2.13 4 2x3 12 90
E7420 2.13 4 2x3 8 90
L7455 2.13 6 3x3 12 65
L7445 2.13 4 2x3 12 50

The other side of the coin is that Dunnington probably uses an L3 cache that runs at half the clock speed of the cores. We recorded a 103 cycle latency, measured with a 2.66GHz CPU (39 ns), for the L3 cache.

Dunnington cache hierarchy

In comparison, the - admittedly much smaller - L3 cache of the quad-core Opteron needs 48 cycles (using a 2.5GHz chip, or 19 ns). The L3 cache is about half as fast as the one found in the Barcelona core, so the L3 is a compromise where the engineers traded in speed for size and power consumption.

Price Comparisons
Comments Locked


View All Comments

  • npp - Tuesday, September 23, 2008 - link

    I didn't got this one very clear - why should a bigger cache reduce cache syncing traffic? With a bigger cache, you would have the potential risc of one CPU invalidating a larger portion of the data another CPU has already in its own cache, hence there would be more data to move between the sockets at the end. If we exaggerate this, every CPU having a copy of the whole main memory in its own cache would obviously lead to enormous syncing effort, not the oposite.

    I'm not familiar with the cache coherence protocol used by Intel on that platform, but even in the positive scenario of a CPU having data for read-only access in its own cache, a request from another CPU for the same data (the chance for this being bigger given the large cache size) may again lead to increased inter-socket communication, since these data won't be fetched from main memory again.

    In all cases, inter-socket communication should be much cheaper than the cost of a main memory access, and it shifts the balance in the right direction - avoiding main memory as long as possible. And now it's clear why Dunnington is a six- rather than eight-core - more cores and less cache would yield a shift in the entirely opposite direction, which isn't what Intel is needing until QPI arrives.

  • narlzac85 - Wednesday, September 24, 2008 - link

    In the best case scenario (I hope the system is smart enough to do it this way), with each VM having 4 CPU cores, they can keep all their threads on one physical die. This means that all 4 cores are working on the same VM/data and should need minimal access to data that another die has changed (if the hypervisor/hostOS processes jump around from core to core would be about it). The inter-socket cache coherency traffic will go down (in the older quad cores, since the 2 physical dual cores have to communicate over the FSB, it might as well have been the same as an 8 socket system populated by dual cores)
  • Nyceis - Tuesday, September 23, 2008 - link

    Can we post here now? :)
  • JohanAnandtech - Wednesday, September 24, 2008 - link

    Indeed. As the IT forums gave quite a few times trouble and we assume quite a few people do not comment in the IT forums as they have to register again. I am still searching for a good solution as these "comment boxes" get messy really quickly.
  • Nyceis - Tuesday, September 23, 2008 - link

    PS - Awesome article - makes me want hex-cores rather than quads in my Xen Servers :)
  • Nyceis - Tuesday, September 23, 2008 - link

    Looks like it :)
  • erikejw - Tuesday, September 23, 2008 - link

    Great article as always.
    However the performance / watt comparison is quite useless for virtualization systems though since they scale well at a multisystem level and for other reasons too

    I won't hurt to make them but what users really care of is performance / dollar (for a lifetime)

    Say the system will be in use for 3 years.
    That makes the total powerbill for a 600W system about 2000$, less then the cost of one Dunnington and since the price difference between the Opteron and Dunnington cpus is like 4800$ you gotta be pretty ignorant to choose system with the performance / watt cost.

    Lets say the AMD system costs 10000$ and the Intel 14800$(will be more due to Dimm differences) and have a 3 year life then the total cost for the systems and power will be 12000 and 16800.

    That leaves us with a real basecost/transaction ratio of

    Intel 5.09 : 4.25 AMD

    AMD is hence 20% more cost effective than Intel in this case.

    Any knowledgable buyer has to look at the whole picture and not at just one cost factor.

    I hope that you include this in your other virtualization articles.

  • JohanAnandtech - Wednesday, September 24, 2008 - link

    You are right, the best way to do this is work with TCO. We have done that in our Sun fir x4450 article. And the feedback I got was to calculate on 5 years, because that was more realistic.

    But for the rest I fully agree with you. Will do asap. How did you calculate the power bill?
  • erikejw - Wednesday, September 24, 2008 - link

    Sounds good, will be interesting.

    The calculations was just a quick and dirty 600W 24/7 for 3 years and using current power prices.

    VM servers are supposed to run like that.

    It would also be interesting to see how the Dunnington responds when using more virtual cores than physical. Will the decline be less than the older Xeons?

    What is a typical (core)load when it comes to this?

    The Nehalems will respond more like the Athlons in this regard and not loose as much when the load increases, at a higher level than AMD though.

    I realised the other day that it seems as AMD have built a servercpu that they take the best of and brings to the desktop market and Intel have done it the other way around.

    The Nehalems architechture seems more "serverlike" but will make a bang on the desktop side too.

  • kingmouf - Thursday, September 25, 2008 - link

    I think this is because they have (or should I say had) a different CPU that they wanted to cover that space, the Itanium. But now they are fully concentrated to x86, so...

Log in

Don't have an account? Sign up now