Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AIby Johan De Gelas on July 29, 2019 8:30 AM EST
Convolutional, Recurrent, & Scalability: Finding a Balance
Despite the fact that Intel's Xeon Phi was a market failure as an accelerator and has been discontinued, Intel has not given up on the concept. The company still wants a bigger piece of the AI market, including pieces that may otherwise be going to NVIDIA.
To quote Intel’s Naveen Rao:
Customers are discovering that there is no single “best” piece of hardware to run the wide variety of AI applications, because there’s no single type of AI.
And Naveen makes a salient point. Because although NVIDIA has never claimed that they provide the best hardware for all types of AI, superficially looking at the most cited benchmarks in press releases across the industry (ResNet, Inception, etc) you would almost believe there was only one type of AI that matters. Convolutional Neural Networks (CNNs or ConvNets) dominate the benchmarks and product presentations, as they are the most popular technology for analyzing images and video. Anything that can be expressed as “2D input” is a potential candidate for the input layers of these popular neural networks.
Some of the most spectacular breakthroughs in recent years have been made with the CNNs. It’s no mistake that ResNet performance has become so popular, for example. The associated ImageNet database, a collaboration between Stanford University and Princeton University, contains fourteen million images; and until the last decade, AI performance on recognizing those images was very poor. CNNs changed that in quick order, and it has been one of the most popular AI challenges ever since, as companies look to outdo each out in categorizing this database faster and more accurately than ever before.
To put all of this on a timeline, as early as 2012, AlexNet, a relatively simple neural network, achieved significantly better accuracy than the traditional machine learning techniques in an ImageNet classification competition. In that test, it achieved an 85% accuracy rate, which is almost half of the error rate of more traditional approaches, which achieved 73% accuracy.
In 2015, the famous Inception V3 achieved a 3,58% error rate in classifying the images, which is similar to (or even slightly better than) a human. The ImageNet challenge got harder, but CNNs got better even without increasing the number of layers, courtesy of residual learning. This led to the famous “ResNet” CNN, now one of the most popular AI benchmarks. To cut a long story short, CNNs are the rockstars of the AI Universe. They get by far most of the attention, testing, and research.
CNNs are also very scalable: adding more GPUs scales (almost) linearly in lowering a network’s training time. Put bluntly, CNNs are a gift from the heavens for NVIDIA. CNNs are the most common reason for why people invest in NVIDIAs expensive DGX servers ($400k) or buy multiple Tesla GPUs ($7k+).
Still, there is more to AI than CNNs. Recurrent Neural Networks for example are also popular for speech recognition, language translation, and time series.
This is why the MLperf benchmark initiative is so important. For the first time, we are getting a benchmark that is not dominated completely by CNNs.
Taking a quick look at MLperf, the Image and object classification benchmarks are CNNs of course, but RNNs (via Neural machine translation) and collaborative filtering are also represented. Meanwhile, even the recommendation engine test is based on a neural network; so technically speaking there is no "traditional" machine learning test included, which is unfortunate. But as this is version 0.5 and the organization is inviting more feedback, it sure is promising and once it matures, we expect it to be the best benchmark available.
Looking at some of the first data, however, via Dell’s benchmarks, it is crystal clear that not all neural networks are as scalable as CNNs. While the ResNet CNN easily quadruples when you move to four times the number of GPUs (and add a second CPU), the collaborative filtering method offers only 50% higher performance.
In fact, quite a bit of academic research revolves around optimizing and adapting CNNs so they handle these sequence modelling workloads just as well as RNNs, and as result can replace the less scalable RNNs.