So far, the TLBs of Barcelona have brought AMD quite a bit of bad press. But now that the problems are fixed in revision B3, the TLB's might actually be one of the main strong points of AMD's newest platform. To understand this, it helps that you read our latest IT article :-).
 
Barcelona or AMD's K10 supports 4K, 2M and 1GB page sizes. 2MB pages are getting more popular (especially on Linux servers) as it significantely reduces the memory management overhead. AMD's TLB architecture:
  • Low latency L1 TLB (Data and Instructions) 48 entries, supporting all pagesizes
  • L2 TLB (Data and Instructions): 512 4k entries, or 128 2M entries

If you compare this with the Intel Penryn family:

  • One instruction TLB: 128 entries (4 KB) but only 8 entries for 2MB pages.
  • The Data TLB has 2 levels:

         16 entries (4 KB)

        256 entries (4 KB), but only 32 for larger pages(2 MB)

You can see that AMD’s K10 family has really massive TLBs compared to the Penryn and previous Intel CPUs, especially if you want to run with large pages. So while this will certainly not affect anyone behind a desktop or mobile, it may well have an impact in the serverworld.

VMWare 3.5 does not yet support Nested Paging, it will be present in an upcoming update. This kind of paging requires really massive TLBs as the page tables of each guest OS are cached in the TLB. But even with shadowpaging, having big TLBs should help when you have a lot of VMs running.

We still have to do quite a bit of benchmarking, but it is clear that the TLB architecture of Barcelona deserves some positive light too. It will be very interesting to see what kind of TLB architecture Nehalem will have, as Nehalem will be the first to support Intel’s Extended Page Tables (EPT, Intel’s version of Nested Pages).

It is interesting to note that Nehalem has a NEW second level 512 entry TLB…

Comments Locked

7 Comments

View All Comments

  • flicker180 - Monday, March 17, 2008 - link

    Johan,

    NPT is already present in ESX 3.5. You may have read some old slide somewhere, but on ALL VMWare documentation (both internal and external) VMWare acknowledges the execution and implementation of NPTs in ESX 3.5. staring right at a slide deck from VMWare right now that states as such. We're running it in our lab right now on B3 Barcelonas with no issue whatsoever.

    cheers,

    Dave Graham
  • Visual - Tuesday, March 18, 2008 - link

    Sorry for the completely off post.
    But I'm really curious, Mr. Dave Graham, did you eventually get barcelona running on your quad-fx board?

    I know quad-fx is officially dead now, but maybe if it's finally working with quadcores its still worth getting one of these ancient boards... and who knows, some mobo maker might even do a quad-fx mobo with some more recent chipset even despite amd giving up on the platform.
    Alternatively, is it ever possible that a server board will work with normal unbuffered ram?
  • flicker180 - Tuesday, March 18, 2008 - link

    Visual,

    hey, never got around to it...got tied into testing Tyan's GT28 systems (the dual twin 1U Johan talked about with regards to CeBit.) Have spent most of my time trying to regress BIOS issues for production units there. However, ESX 3.5 is running quite happily on Barcelona B3 2352s using Tyan S3992-E boards. ;)

    cheers,

    Dave
  • JohanAnandtech - Tuesday, March 18, 2008 - link

    I am searching for a way to verify this, but I heard at VMWorld 2008 (March 2008) in a session by VMWare architect Richard brunner that is was going to be enabled in one of the upcoming updates of 3.5.

    Do you have proof? :-) No issues doesn't mean that NPT is enabled.
  • flicker180 - Tuesday, March 18, 2008 - link

    Johan,

    i can't give you access to VMWare internal documentation, so, i'll present what I can to you. you can email me directly if you wish. Talk to Kris Kubicki for my EMC email address.

    cheers,

    dave
  • OndrejSc - Monday, March 17, 2008 - link

    In theory a marginal design advantage like this could result in a tangible performance benefit. But there are far greater design advantages (integrated memory controller) that still aren't able to redeem the lackluster overall speed.
  • JohanAnandtech - Tuesday, March 18, 2008 - link

    It is too little in most apps, but not in virtualized apps. NPT gives us a 10-20% performance difference.. far from marginal. If you consider that page tables updates can cost from 4 to a 1000 time more cycles in a virtualized environment than in a native one, it is clear that TLB flushes are a lot more costly.

    You are right in a native environment, but wrong about virtualized servers: TLB size does matter there!

Log in

Don't have an account? Sign up now