IDF has started and the first benchmarks of Nehalem are going to start popping up. It is without a doubt an impressive architecture with a much better platform to run on, but this CPU is not about giving you better frames per second in your favorite game than the Penryn family. Let me make that more clear: even when the GPU is not the bottleneck, it is likely that most games will not be significantly faster than on Penryn. We, the people behind will probably have the most fun with it, more than your favorite review crew at :-). And no, I have not seen any tests before I type this. Nehalem is about improving HPC, Database, and virtualization performance, and much less about gaming performance. Maybe this will change once games get some heavy physics threads, but not right away.

Why? Most Games are about fast caches and super integer performance. After all, most of the Floating point action is already happening on the GPU. The Core 2 CPUs were a huge step forward in integer performance (not the least because of memory disambiguation) compared to the CPUs of that time (P4 and K8). Nehalem is only a small step forward in integer performance, and the gains due to slightly increased integer performance are mostly negated by the new cache system. In a previous post I told you that most games really like the huge L2 of the Core family. With Nehalem they are getting a 32KB L1 with a 4 cycle latency, next a very small (compared to the older Intel CPUs) 256KB L2 cache with 12 cycle latency, and after that a pretty slow 40 cycle 8MB L3. When running on Penryn, they used to get a 3 cycle L1 and a 14 cycle 6144KB L2. The Penryn L2 is 24 times larger than on Nehalem!

The percentage of L2 caches misses for most games running on a Penryn CPU is extremely low. Now that is going to change. The integrated memory controller of Nehalem will help some, but the fact remains that the L3 is slow and the L2 is small. However, that doesn't mean Intel made a bad choice. Intel made a superbly good choice by improving the performance where Core (Merom/Penryn) was mediocre to good. Penryn was already a magnificent gaming CPU, but it could not beat the AMD competition in HPC benchmarks, and AMD put up a good fight in database performance benchmarks. Now Intel is ready to fix these shortcomings.

Most Database code cannot use the wide architecture of Penryn very well. The number of instructions per cycle can be lower than 0.5 and waiting for the memory is the most probable cause. SMT or Hyper-Threading can do wonders here: while one thread waits for a memory stall, the other thread continues working and vice versa.

Secondly, quad (and eight) socket performance is going to improve a lot as four Nehalems only have to keep four L3 caches in sync, while a similar Tigerton system has to keep eight L2 caches in sync. That is why the cache system is perfect for server performance, but a little less interesting for gaming performance.

The massive bandwidth that the integrated tri-channel memory controller delivers should also do wonders for HPC code, and the new TLB architecture with EPT will make Nehalem shine compared to its older Core brothers.

No, Nehalem wasn't made for the gaming enthusiasts. Rather, it was made to please the IT and HPC people. So we say bring it to; it's just not that interesting for you gamers! ;-)

Comments Locked


View All Comments

  • gaiden2k5 - Tuesday, August 19, 2008 - link

    knowing that i wont be able to afford a Nahalem CPU, i become more interested in seeing the price drop for the Penryn's and hoping for Q9450 to fall below $200 :)
  • Nehemoth - Tuesday, August 19, 2008 - link

    I can't agreed more with u guys, cause I hope this architecture fly, course I Hope a fight between AMD and Intel on servers; is really no so easy as the desktop environment.

    I really just hope for FBDIMM improvements, no so Hot please and also of course I really hope Good To Excellent competition between Shangai and Nehalem, but the real improvement here from AMD should be when they're switch to DDR3.
  • Turas - Tuesday, August 19, 2008 - link

    Aren't FB-Dimms being dropped? I was under the impressions even the dual socket 1366 (or whatever it is) will be using regular DDR3 now.
  • MonkeyPaw - Tuesday, August 19, 2008 - link

    Yeah, FB-Dimms are not the future. They add heat, complexity, and cost. Of course, most of that doesn't affect desktop users, unless you are a MacPro/Skulltrail user. The IMC and Hypertransport have effectively killed complex memory configurations, since each CPU finally gets to manage its own memory.

    Really, Nehalem is just taking Opteron's strong points and adding it to the Core2 architecture. Intel should win a lot of server benches now. And to think they spent billions on Itanium for so many years. IA64's market keeps dwindling, and this will only chip away more. Just goes to show that Intel is not indestructible.
  • Nehemoth - Tuesday, August 19, 2008 - link

    Really nice to hear that.

    That's what we need standard, as DDR3 becomes the new one.

    Also the last time I hear an Itanium investment number was 10 Billions.

    But for some cases (In our company we have some) Itanium is the right choice.
  • AlexWade - Tuesday, August 19, 2008 - link

    The price of the old Core 2 products will come down. So even if Core i7 (or whatever crazy name it is called) doesn't help in games, it will help my wallet when I am upgrading my computer later this year.
  • IntelUser2000 - Wednesday, August 20, 2008 - link

    I also love how the latency comparisons between Yorkfield and Nehalem is being skewed in favor of the article.

    According to ANANDTECH'S benchmarks, Nehalem's L2 latency was 11 cycles and Yorkfield was 15 cycles, not 12 and 14.

    Multiple websites have argued that Core 2's performance was not achievable(4 issue cannot be fully utilized blah blah). While Nehalem won't be a miracle in single thread, I think it'll be better than what this article will imply.
  • JohanAnandtech - Friday, August 22, 2008 - link

    Do you think one cycle of L2-cache latency will matter? Depending on the tool you use, quadcore Intel's report a 14 tot 15 cycle L2. On Nehalem, I am just using the numbers I had at that point. Seems that 10-12 cycles is more or less accurate.

    And no I didn't argue that 4-issue can not be fully utilized (that is BS, the 4-way decoding is for peak moments, to get the average higher). I argued that well implemented SMT can help typical low IPC loads such as database workloads to achieve much higher performance.
  • Mithan - Tuesday, August 19, 2008 - link

    I've got a E8400 OC'ed to 3.6ghz, so I am good :)

    Worst case, I swap out my 8800GTS512 next year for whatever is "latest and greatest" and boom, double or triple frame rates in most games.

    Core i7 can wait.
  • Jedi2155 - Tuesday, August 19, 2008 - link

    Still running with my nearly 2 year old E6600 OC'ed to 3.6 GHz :). Of course not as fast a penryn clock for clock, but I was thinking about moving to a Nehalem. I'm happy my CPU hasn't burnt out yet with nearly 2 years of 1.55 volts being pushed through it along with hot SoCal weather.

    Still, I wonder how Nehalem helps with encoding performance.

Log in

Don't have an account? Sign up now