Introduction to Server Benchmarking

Each time we publish a new server platform review, several of our readers inquire about HPC and rendering benchmarks. We're always willing to accommodate reasonable requests, so we're going to start expanding beyond our usual labor intensive virtualization benchmarks. This article is our first attempt. It was a bumpy ride, but this first attempt produced some very interesting insights.

The core counts of modern servers have increased at an incredible pace, making many benchmarks useless if we want to assess the maximum throughput. Just three years ago, we could still run benchmarks like Fritz Chess, Winrar, and zVisuel to satisfy our curiosity. We also performed real-world benchmarks like MySQL OLAP on our octal-core servers. All these benchmarks are pretty useless now on our 48-core Magny-Cours and 80-thread Westmere-EX systems. The number of applications that can really take advantage of the core counts found in quad- and even dual-socket servers continues to get lower and lower.

Most servers are now running hypervisors and virtualization of some form, so we naturally focused on virtualized environments. However, many of our readers are hardware enthusiasts, so while we wait for the new server platforms such as Intel's Romley-EP (Sandy Bridge EP) and AMD's Interlagos (Bulldozer) to appear, we decided to expand our benchmark suite. Our first attempt is not very ambitious: we'll tackle Cinebench (rendering) and Stars Euler 3D CFD (HPC). Both are quick and easy benchmarks to perform... or at least that'ss what we expected going in. On the plus side, our testing results are a lot more interesting than we imagined they would be.

Benchmark Setup
Comments Locked


View All Comments

  • proteus7 - Tuesday, October 11, 2011 - link

    STREAM triad on a 4S Xeon E7 should hit about 65GB/s, unless your memory, or UEFI/bios options are misconfigured. Firmware settings can have a HUGE difference on these systems.

    Did you:
    Enable Hemisphere mode?
    Disable HT?
    If running Windows, assume it was Server 2008 R2 SP1?
    If running Windows, realize that only certain applications, compiled with specific flags will work on core counts over 64 (kgroup0). Not an issue if HT was off.
    Enable prefetch modes in firmware?
    ensure system firmware was set to max perf, and not powersaving modes?
    if running windows, set power options to max performance profile? (default power profile on server drops perf substantially for short burst benchmarks)
    TPC-E is also a great benchmark to run (need some SSD storage/Fusion I/O) HPCC/Linpack are good for HPC testing.
  • pventi - Monday, October 31, 2011 - link

    As you can read from the icc manual when running on non INTEL processors the Non-Temporal pre-fetches are not implemented in the final machine code. This alone means it could be up to 27% faster.

    Another reason why it's slower is because the "standard" HW configuration of the Opteron throttles the DRAM pre-fetchers when under load.
    Under Linux this behaviour can be changed from shell and should add another 5~10% increase in performance.

    So this benchmark should show ~ 30% higher number for the Opteron.

    Best Regards

Log in

Don't have an account? Sign up now