Intel's newest Quad Xeon MP versus HP's DL585 Quad Opteron
by Johan De Gelas on November 10, 2006 12:00 PM EST- Posted in
- IT Computing
Secure Socket Layers RSA Performance
Secure web communication is possible through the utilization of the Secure Sockets Layer (SSL). Using "openssl speed rsa" we can measure the number of RSA public keys (sign) operations that a system can perform per second using OpenSSL 0.9.8a. Both verifies/s and signs/s benchmarks are rather synthetic, but give an idea of the "pure" encrypting and decrypting speed.
Note that this time we did not compile OpenSSL with specific flags for each architecture (march="xxx") but we used the same flags on each CPU. We feel that this better reflects the real world use of SSL as most people do not know the specific CPU architecture they are running on. So we compiled with the following on all x86 systems:
Compared to our previous findings, the Opteron 2.4 GHz no longer (slightly) beats the 3 GHz Xeon DP 5160. This is the result of replacing a "compiled specifically for each architecture" binary with a binary that is compiled with the more generic -o3 optimization, which as stated is more realistic. Still, our previous conclusion stands: clock for clock, the Opteron is quite a bit better at this than the Xeon "Core" architecture (Xeon 5160) and a lot better than the Xeon "NetBurst" architecture (Xeon MP 7130). Despite being clocked 20% lower than the Xeon 5160, it is only 9% slower at 4 threads. The 8 MAUs of the Sun T1 still give the 1 GHz Sun the edge when we fire off 32 "SSL RSA Signing" threads.
In the case of doing verifies, the server has to authenticate the identity of the client. This is a lot less intensive, and we show you the verifies/s numbers at 2048 bits. At 1024 bits length, both the Woodcrest and Opteron were able to verify more than 50,000 keys per core, and that is a hard limit of the OpenSSL benchmark.
Again, the Opteron takes the lead. Encrypting or signing will slow down a server much quicker than verifying keys, so this benchmark is of smaller importance than the sign/s benchmark.
Secure web communication is possible through the utilization of the Secure Sockets Layer (SSL). Using "openssl speed rsa" we can measure the number of RSA public keys (sign) operations that a system can perform per second using OpenSSL 0.9.8a. Both verifies/s and signs/s benchmarks are rather synthetic, but give an idea of the "pure" encrypting and decrypting speed.
Note that this time we did not compile OpenSSL with specific flags for each architecture (march="xxx") but we used the same flags on each CPU. We feel that this better reflects the real world use of SSL as most people do not know the specific CPU architecture they are running on. So we compiled with the following on all x86 systems:
gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -O3 -Wa,-noexecstack -g -Wall -DMD32_REG_T=int -DMD5_ASM
We also included the T2000 numbers with MAU acceleration via the Solaris Cryptographic Framework from our previous server CPU shootout. One thread of OpenSSL Signing per core is optimal so we tested the quad Xeon MP 7130 with a maximum of 16 threads, as there are 8 physical but 16 logical cores.Compared to our previous findings, the Opteron 2.4 GHz no longer (slightly) beats the 3 GHz Xeon DP 5160. This is the result of replacing a "compiled specifically for each architecture" binary with a binary that is compiled with the more generic -o3 optimization, which as stated is more realistic. Still, our previous conclusion stands: clock for clock, the Opteron is quite a bit better at this than the Xeon "Core" architecture (Xeon 5160) and a lot better than the Xeon "NetBurst" architecture (Xeon MP 7130). Despite being clocked 20% lower than the Xeon 5160, it is only 9% slower at 4 threads. The 8 MAUs of the Sun T1 still give the 1 GHz Sun the edge when we fire off 32 "SSL RSA Signing" threads.
In the case of doing verifies, the server has to authenticate the identity of the client. This is a lot less intensive, and we show you the verifies/s numbers at 2048 bits. At 1024 bits length, both the Woodcrest and Opteron were able to verify more than 50,000 keys per core, and that is a hard limit of the OpenSSL benchmark.
Again, the Opteron takes the lead. Encrypting or signing will slow down a server much quicker than verifying keys, so this benchmark is of smaller importance than the sign/s benchmark.
88 Comments
View All Comments
JohanAnandtech - Saturday, November 11, 2006 - link
Well, we did mentione it at our price comparison. From a performance point of view, the G2 is within 2% of the DL585 given a similar configuration.Getting a server in the lab is not like getting a videochip for review. The machines are much more expensive, and you need much more time to review them properly. So OEMs are less likely to send you the necessary hardware. For a videocard they send out a $500 item that can be reviewed in a few weeks, maybe even a few days. For Server like these, they have to send out a $20000 machine and be able to miss it for a month or two at the least.
Viditor - Saturday, November 11, 2006 - link
I can certainly understand and empathise with the situation...and I did enjoy the article, Johan!
The reason I mentioned it is that line in your conclusion...
I thought that (considering the circumstances) it was a bit unfair and misleading...
JohanAnandtech - Saturday, November 11, 2006 - link
I just pointed out that it is a bit weird that a newer revision of the DL585 (it was thé HP Opteron machine just a few months ago) used SCSI 160. There is no reason at all why HP could not replace this: they revised the server anyway.I should mentioned that these results were solved in the G2, but still it is a missed chance... eventhough I reported it a bit too late :-)
photoguy99 - Friday, November 10, 2006 - link
yes, bring it on!finalfan - Friday, November 10, 2006 - link
On page The Official SPEC Numbers, in second table SPEC FP 2000 Performance, the positions of (4/8) HP Opteron AM2 and (8/8) Hitachi Itanium 2 should be switched. No Itanium runs at 3.4G and no way a 4way 1.6G AM2 can sit in second place.JohanAnandtech - Friday, November 10, 2006 - link
Corrected. It is weird, the accurate numbers were in the orginal document. The generation of the table went wrong. I have double checked and now the FP numbers should all be accurateJarredWalton - Friday, November 10, 2006 - link
Probably my fault. I think when it got put into Excel that the various x/y numbers were converted to dates. I thought I fixed all of those, but probably missed one or two. Sorry.icarus4586 - Friday, November 10, 2006 - link
This report brought to you by the department of redundancy department.
bwmccann - Friday, November 10, 2006 - link
When are you guys going to start benchmarking server CPUs using applications that are widely used in organizations on a daily basis?Most companies have a very high percentage of servers running Windows. With that I would love to see some test on SQL, Oracle, Exchange, and other core components of enterprises today.
Also it would be nice to see a closer comparison of the servers. For example you tested a DL585. A DL580 (Intel Woodcrest) would have been better suited since some of the components would be the same.
JohanAnandtech - Friday, November 10, 2006 - link
http://www.anandtech.com/IT/showdoc.aspx?i=2793">http://www.anandtech.com/IT/showdoc.aspx?i=2793Most of the time Jason does the Windows benchmarking, me and my team do the Linux benchmarking.
Java, MySQL and SSL are also core components of many enterprise apps.
We are working on Oracle and got access to a realworld Oracle database a few weeks ago (for the first time), but it takes time to really understand what your benchmark is telling you and how you must configure your db. And Oracle is ...very stubborn, even patching to a slightly higher version can lead to big trouble.
The DL585 is a direct competitor (quad socket) in this space, more so than the DL580 (DUal Socket)