Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Name: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Item: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Author: Johan De Gelas

by Johan De Gelas on July 29, 2019 8:30 AM EST

56 Comments | Add A Comment

56 Comments

Convolutional, Recurrent, & Scalability: Finding a Balance

Despite the fact that Intel's Xeon Phi was a market failure as an accelerator and has been discontinued, Intel has not given up on the concept. The company still wants a bigger piece of the AI market, including pieces that may otherwise be going to NVIDIA.

To quote Intel’s Naveen Rao:

Customers are discovering that there is no single “best” piece of hardware to run the wide variety of AI applications, because there’s no single type of AI.

And Naveen makes a salient point. Because although NVIDIA has never claimed that they provide the best hardware for all types of AI, superficially looking at the most cited benchmarks in press releases across the industry (ResNet, Inception, etc) you would almost believe there was only one type of AI that matters. Convolutional Neural Networks (CNNs or ConvNets) dominate the benchmarks and product presentations, as they are the most popular technology for analyzing images and video. Anything that can be expressed as “2D input” is a potential candidate for the input layers of these popular neural networks.

Some of the most spectacular breakthroughs in recent years have been made with the CNNs. It’s no mistake that ResNet performance has become so popular, for example. The associated ImageNet database, a collaboration between Stanford University and Princeton University, contains fourteen million images; and until the last decade, AI performance on recognizing those images was very poor. CNNs changed that in quick order, and it has been one of the most popular AI challenges ever since, as companies look to outdo each out in categorizing this database faster and more accurately than ever before.

To put all of this on a timeline, as early as 2012, AlexNet, a relatively simple neural network, achieved significantly better accuracy than the traditional machine learning techniques in an ImageNet classification competition. In that test, it achieved an 85% accuracy rate, which is almost half of the error rate of more traditional approaches, which achieved 73% accuracy.

In 2015, the famous Inception V3 achieved a 3,58% error rate in classifying the images, which is similar to (or even slightly better than) a human. The ImageNet challenge got harder, but CNNs got better even without increasing the number of layers, courtesy of residual learning. This led to the famous “ResNet” CNN, now one of the most popular AI benchmarks. To cut a long story short, CNNs are the rockstars of the AI Universe. They get by far most of the attention, testing, and research.

CNNs are also very scalable: adding more GPUs scales (almost) linearly in lowering a network’s training time. Put bluntly, CNNs are a gift from the heavens for NVIDIA. CNNs are the most common reason for why people invest in NVIDIAs expensive DGX servers ($400k) or buy multiple Tesla GPUs ($7k+).

Still, there is more to AI than CNNs. Recurrent Neural Networks for example are also popular for speech recognition, language translation, and time series.

This is why the MLperf benchmark initiative is so important. For the first time, we are getting a benchmark that is not dominated completely by CNNs.

Taking a quick look at MLperf, the Image and object classification benchmarks are CNNs of course, but RNNs (via Neural machine translation) and collaborative filtering are also represented. Meanwhile, even the recommendation engine test is based on a neural network; so technically speaking there is no "traditional" machine learning test included, which is unfortunate. But as this is version 0.5 and the organization is inviting more feedback, it sure is promising and once it matures, we expect it to be the best benchmark available.

Looking at some of the first data, however, via Dell’s benchmarks, it is crystal clear that not all neural networks are as scalable as CNNs. While the ResNet CNN easily quadruples when you move to four times the number of GPUs (and add a second CPU), the collaborative filtering method offers only 50% higher performance.

In fact, quite a bit of academic research revolves around optimizing and adapting CNNs so they handle these sequence modelling workloads just as well as RNNs, and as result can replace the less scalable RNNs.

More Than Deep Learning Intel’s View on AI: Do What NV Doesn't

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

56 Comments

View All Comments

tipoo - Monday, July 29, 2019 - link
Fyi, when on page 2 and clicking "convolutional, etc" for page 3, it brings me back to the homepage
Ryan Smith - Monday, July 29, 2019 - link
Fixed. Sorry about that.
Eris_Floralia - Monday, July 29, 2019 - link
Johan's new piece in 14 months! Looking forward to your Rome review :)
JohanAnandtech - Monday, July 29, 2019 - link
Just when you think nobody noticed you were gone. Great to come home again. :-)
Eris_Floralia - Tuesday, July 30, 2019 - link
Your coverage on server processors are great!
Can still well remember Nehalem, Barcelona, and especially Bulldozer aftermath articles
djayjp - Monday, July 29, 2019 - link
Not having a Tesla for such an article seems like a glaring omission.
warreo - Monday, July 29, 2019 - link
Doubt Nvidia is sourcing AT these cards, so it's likely an issue of cost and availability. Titan is much cheaper than a Tesla, and I'm not even sure you can get V100's unless you're an enterprise customer ordering some (presumably large) minimum quantity.
olafgarten - Monday, July 29, 2019 - link
It is available https://www.scan.co.uk/products/32gb-pny-nvidia-te...
abufrejoval - Tuesday, July 30, 2019 - link
Those bottlenecks are over now and P100, V100 can be bought pretty freely, as well as RTX6000/8000 (Turings). Actually the "T100" is still missing and the closest siblings (RTX 6000/8000) might never get certified for rackmount servers, because they have active fans while the P100/V100 are designed to be cooled by server fans. I operate a handful of each and getting budget is typically the bigger hurdle than purchasing.
SSNSeawolf - Monday, July 29, 2019 - link
I've been trying to find more information on Cascade Lake's AI/VNNI performance, but came up dry. Thanks, Johan. Eagerly putting this aside for my lunch reading today.

Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Convolutional, Recurrent, & Scalability: Finding a Balance

Post Your Comment

56 Comments

View All Comments

tipoo - Monday, July 29, 2019 - link

Ryan Smith - Monday, July 29, 2019 - link

Eris_Floralia - Monday, July 29, 2019 - link

JohanAnandtech - Monday, July 29, 2019 - link

Eris_Floralia - Tuesday, July 30, 2019 - link

djayjp - Monday, July 29, 2019 - link

warreo - Monday, July 29, 2019 - link

olafgarten - Monday, July 29, 2019 - link

abufrejoval - Tuesday, July 30, 2019 - link

SSNSeawolf - Monday, July 29, 2019 - link

Log in

Don't have an account? Sign up now