Machine Learning Inference Performance

AIMark 3

AIMark makes use of various vendor SDKs to implement the benchmarks. This means that the end-results really aren’t a proper apples-to-apples comparison, however it represents an approach that actually will be used by some vendors in their in-house applications or even some rare third-party app.

鲁大师 / Master Lu - AIMark 3 - InceptionV3 鲁大师 / Master Lu - AIMark 3 - ResNet34 鲁大师 / Master Lu - AIMark 3 - MobileNet-SSD 鲁大师 / Master Lu - AIMark 3 - DeepLabV3

In AIMark 3, the benchmark uses each vendor’s proprietary SDK in order to accelerate the NN workloads most optimally. For Qualcomm’s devices, this means that seemingly the benchmark is also able to take advantage of the new Tensor cores. Here, the performance improvements of the new Snapdragon 865 chip is outstanding, posting in 2-3x performance compared to its predecessor.

AIBenchmark 3

AIBenchmark takes a different approach to benchmarking. Here the test uses the hardware agnostic NNAPI in order to accelerate inferencing, meaning it doesn’t use any proprietary aspects of a given hardware except for the drivers that actually enable the abstraction between software and hardware. This approach is more apples-to-apples, but also means that we can’t do cross-platform comparisons, like testing iPhones.

We’re publishing one-shot inference times. The difference here to sustained performance inference times is that these figures have more timing overhead on the part of the software stack from initialising the test to actually executing the computation.

AIBenchmark 3 - NNAPI CPU

We’re segregating the AIBenchmark scores by execution block, starting off with the regular CPU workloads that simply use TensorFlow libraries and do not attempt to run on specialized hardware blocks.

AIBenchmark 3 - 1 - The Life - CPU/FP AIBenchmark 3 - 2 - Zoo - CPU/FP AIBenchmark 3 - 3 - Pioneers - CPU/INT AIBenchmark 3 - 4 - Let's Play - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/INT AIBenchmark 3 - 8 - Blur iT! - CPU/FP

Starting off with the CPU accelerated benchmarks, we’re seeing some large improvements of the Snapdragon 865. It’s particularly the FP workloads that are seeing some big performance increases, and it seems these improvements are likely linked to the microarchitectural improvements of the A77.

AIBenchmark 3 - NNAPI INT8

AIBenchmark 3 - 1 - The Life - INT8 AIBenchmark 3 - 2 - Zoo - Int8 AIBenchmark 3 - 3 - Pioneers - INT8 AIBenchmark 3 - 5 - Masterpiece - INT8 AIBenchmark 3 - 6 - Cartoons - INT8

INT8 workload acceleration in AI Benchmark happens on the HVX cores of the DSP rather than the Tensor cores, for which the benchmark currently doesn’t have support for. The performance increases here are relatively in line with what we expect in terms of iterative clock frequency increases of the IP block.

AIBenchmark 3 - NNAPI FP16

AIBenchmark 3 - 1 - The Life - FP16 AIBenchmark 3 - 2 - Zoo - FP16 AIBenchmark 3 - 3 - Pioneers - FP16 AIBenchmark 3 - 5 - Masterpiece - FP16 AIBenchmark 3 - 6 - Cartoons - FP16 AIBenchmark 3 - 9 - Berlin Driving - FP16 AIBenchmark 3 - 10 - WESPE-dn - FP16

FP16 acceleration on the Snapdragon 865 through NNAPI is likely facilitated through the GPU, and we’re seeing iterative improvements in the scores. Huawei’s Mate 30 Pro is in the lead in the vast majority of the tests as it’s able to make use of its NPU which support FP16 acceleration, and its performance here is quite significantly ahead of the Qualcomm chipsets.

AIBenchmark 3 - NNAPI FP32

AIBenchmark 3 - 10 - WESPE-dn - FP32

Finally, the FP32 test should be accelerated by the GPU. Oddly enough here the QRD865 doesn’t fare as well as some of the best S855 devices. It’s to be noted that the results here today were based on an early software stack for the S865 – it’s possible and even very likely that things will improve over the coming months, and the results will be different on commercial devices.

Overall, there’s again a conundrum for us in regards to AI benchmarks today, the tests need to be continuously developed in order to properly support the hardware. The test currently doesn’t make use of the Tensor cores of the Snapdragon 865, so it’s not able to showcase one of the biggest areas of improvement for the chipset. In that sense, benchmarks don’t really mean very much, and the true power of the chipset will only be exhibited by first-party applications such as the camera apps, of the upcoming Snapdragon 865 devices.

System Performance GPU Performance & Power
Comments Locked


View All Comments

  • ThreeDee912 - Monday, December 16, 2019 - link

    I feel you Andrei. I'm sitting here facepalming at these comments. I think a lot of people truly do not understand what SPEC was designed for or how energy efficiency works.
  • joms_us - Monday, December 16, 2019 - link

    To an average Joe or Jane, SPEC is a worthless basis of comparison.You can tell the sheep his phone has the fastest SoC on the planet and he will prolly believe you.

    If you can show an iPhone can finish a bunch of tasks in half a day and bunch of tasks on Android phone in a whole day then I will believe you that iPhone has twice the performance versus competition. But if you are just showing a nanosecond difference between two phones and thousand difference in benchmark scores then keep your palm on your face. =D
  • s.yu - Tuesday, December 17, 2019 - link

    I think Andrei has made it clear enough, perhaps not for you, but then Anandtech is not the site for you. Go visit Engadget or something you'll fit right in.
  • jospoortvliet - Monday, December 16, 2019 - link

    Same here. 🤦‍♀️🤦‍♂️🤦‍♀️🤦‍♂️
  • joms_us - Monday, December 16, 2019 - link

    You must have spent thousand of dollars on expensive phones because the SPEC result is higher on those phone? LOL

    You buy them to run SPEC? LOL
  • milli - Monday, December 16, 2019 - link

    I remember reading an article a couple years ago, where it was mentioned that a couple key BitBoys staff members left the company. The writing has been on the walls for years and recently Adreno architectural development has slowed down to a halt.
  • trivik12 - Monday, December 16, 2019 - link

    While Apple cores are faster, Android flagships will come shitloads of memory and so when it comes to daily tasks it will still keep in pace. S11+ will supposedly start at 12GB LPDDR5 ram vs 4GB ram for Apple flagships.

    At this point performance is not the issue for these android flagships considering the workloads of mobile phone. I would prefer them to make it more efficient working with Google at OS level. iphone's big advantage is how efficient it is relative to battery size of its phone.Key metrics are web browsing on Wifi and LTE plus video playback(streaming on netflix).
  • NetMage - Friday, December 27, 2019 - link

    iPhone is also efficient at RAM usage - native code versus JIT bytecode gives iOS a 1.5x to 2x less RAM advantage over Android.
  • cha0z_ - Friday, December 27, 2019 - link

    As already said - ios is a lot less RAM hungry and it's efficient. 4GB is quite enough + most android phones with a lot of memory loves to drop apps from there too. Not to mention that you will not notice that speed difference till you try to do something demanding power... and buying a phone for 1k euro just to browse FB is a bad buy decision anyway (for anyone except those who have money to burn ofc).

    But you will notice the efficiency difference. My iphone 11 pro max will last twice and more times the exynos note 9 I got in light workloads. The same iphone will last x3+ times more in heavy workloads while giving smooth and fast performance/gaming in contrary to the note 9.
  • quiksilvr - Monday, December 16, 2019 - link

    I will wait until they develop later processors with 5G built in.

Log in

Don't have an account? Sign up now