CPU Performance

Before we evaluate the Galaxy S8’s system-level performance and battery life, we’ll run some lower-level tests to examine CPU integer and floating-point IPC and memory system throughput and latency. The first test will be SPECint2000, the integer component of the SPEC CPU2000 benchmark developed by the Standard Performance Evaluation Corporation. This collection of single-threaded workloads allows us to compare IPC for competing CPU microarchitectures. The scores below are not officially validated numbers, which requires the test to be supervised by SPEC, but we’ve done our best to choose appropriate compiler flags and to get the tests to pass internal validation.

SPECint2000 - Estimated Scores
ARMv8 / AArch64
  Exynos 8895
(Galaxy S8)
Snapdragon 835
(Galaxy S8)
Snapdragon 821
(LeEco Le Pro3)
Kirin 960
(Mate 9)
164.zip 1120 1207 1273 1217
175.vpr 3889 3889 1687 4118
176.gcc 2000 1930 1746 2157
181.mcf 1268 1146 1200 1118
186.crafty 2083 2222 1613 2222
197.parser 1125 1364 1059 1395
252.eon 3333 3333 3714 3421
253.perlmk 1698 1714 1513 1748
254.gap - 1864 1594 1930
255.vortex 2235 1900 1712 2111
256.bzip2 1351 1376 1172 1402
300.twolf 2113 2419 847 2479

The peak CPU operating frequencies for all the SoCs in the table above fall within 2% of each other, making it easier to compare IPC. It’s interesting to see how close the Exynos 8895’s M2 cores are to the Snapdragon 835’s Kryo 280 cores in integer performance. Each core separates itself in a couple tests—the S835 is faster in parser (21%) and twolf (14%) while the E8895 is faster in vortex (18%)—but generally the performance differences are less than 10%.

Both the M2 core inside E8895 and the Cortex-A73 core inside S835/K960 can dispatch 4 µops/cycle; however, there are some differences between their execution pipelines. The M1 (and from what I can tell the M2) has 2 simple ALU/INT pipes for basic operations, such as additions and shifts, and 1 complex pipe for muliplication/division. The A73 has 2 complex ALU/INT pipes. While both can handle basic operations, only one ALU handles integer multiplication and multiply-accumulate operations, while the other focuses on integer division. This means the M2 and A73 can both perform 2 basic operations in parallel but cannot perform 2 of the same complex operations in parallel. The A73 can dual issue a MUL/MAC alongside a divide/add/shift, which the M2 cannot do, but the M2 can issue 2 basic operations alongside 1 complex instead of a 1/1 split for A73. Obviously, any increase in throughput resulting from these differences will be highly workload dependent.

SPECint2000 64b/32b Estimated Ratio/GHz

The chart above accounts for differences in CPU frequency by dividing the estimated SPECint2000 ratio score by CPU frequency, which makes it clear to see that, in this group of tests at least, there’s no IPC difference between Galaxy S8 models running Snapdragon 835 or Exynos 8895. We also see that there’s essentially no significant difference between Kirin 960’s A73 CPU core and Snapdragon 835’s semi-custom A73 core. Otherwise, the S835/E8895 hold a 26% IPC advantage over the previous generation Snapdragon 820/821, whose fully-custom Kryo CPU performs on par with the older ARM A57 core when running integer workloads.

While the in-order A53 CPU works very well as a lower-power companion core, building octa-core A53 SoCs does not make much sense. The bigger cores provide a significant performance advantage over the A53 – the older A57 is almost twice as fast here while the A73/M2 offer 2.2x more IPC – yielding a better overall user experience when dealing with the short, bursty workloads common to smartphone use cases. And for those of you still using a phone with a Snapdragon 801 SoC and wondering if buying a new flagship phone would deliver a noticeable performance gain, the answer is yes. The Krait 400 CPU in S801 performs about the same as the A53 in these integer workloads.

Geekbench 4 - Integer Performance
Single Threaded
  Exynos 8895
(Galaxy S8)
Snapdragon 835
(Galaxy S8)
Snapdragon 821
(LeEco Le Pro3)
Kirin 960
(Mate 9)
AES 971.2 MB/s 905.4 MB/s 535.8 MB/s 911.6 MB/s
LZMA 3.20 MB/s 2.84 MB/s 2.20 MB/s 3.03 MB/s
JPEG 17.2 Mpixels/s 16.0 Mpixels/s 21.7 Mpixels/s 16.1 Mpixels/s
Canny 29.1 Mpixels/s 22.5 Mpixels/s 31.2 Mpixels/s 22.5 Mpixels/s
Lua 1.63 MB/s 1.70 MB/s 1.43 MB/s 1.72 MB/s
Dijkstra 1.25 MTE/s 1.58 MTE/s 1.41 MTE/s 1.53 MTE/s
SQLite 42.2 Krows/s 51.2 Krows/s 36.5 Krows/s 51.6 Krows/s
HTML5 Parse 7.63 MB/s 8.48 MB/s 7.48 MB/s 7.99 MB/s
HTML5 DOM 2.50 Melems/s 2.18 Melems/s 0.84 Melems/s 2.15 Melems/s
Histogram Equalization 50.3 Mpixels/s 49.9 Mpixels/s 52.3 Mpixels/s 48.6 Mpixels/s
PDF Rendering 57.4 Mpixels/s 47.2 Mpixels/s 53.6 Mpixels/s 44.6 Mpixels/s
LLVM 231.8 functions/s 250.1 functions/s 164.8 functions/s 260.4 functions/s
Camera 6.58 images/s 5.47 images/s 7.17 images/s 5.45 images/s

The updated Geekbench 4 workloads give us a second look at integer IPC. Once again the performance difference between the S835 and E8895 is generally between 5% to 10%, although there is a bit more variation in these tests. The E8895 pulls ahead in Canny (29%), PDF Rendering (22%), and Camera (20%), while the S835 holds the advantage in Dijkstra (26%) and SQLite (21%).

Geekbench 4  (Single Threaded) Integer Score/GHz

After accounting for differences in CPU frequency, the E8895’s M2 core shows a minimal 5% advantage over S835’s Kryo 280 core in the Geekbench integer suite. As expected, the S835 and Kirin 960 perform the same, with both showing a negligible gain relative to the previous generation SoCs with A72 CPU cores. The integer IPC gap between the S835 and S820/S821 narrows to only 11% in Geekbench 4. The S820/S821 is still no better than SoCs with the A57 core despite posting better results in many of the individual integer tests. Its poor performance in LLVM and HTML5 DOM account for its lower overall score.

We tested the IPC of the A53 core using two different SoCs; in the Snapdragon 625 the A53 performs 10% better than its counterpart in the Kirin 655, primarily because of the Snapdragon’s lower memory latency (in-order cores are particularly sensitive to latency). Both A53 examples still manage to outperform the S801 however, with a margin ranging from 13% to 24%.

Geekbench 4 - Floating Point Performance
Single Threaded
  Exynos 8895
(Galaxy S8)
Snapdragon 835
(Galaxy S8)
Snapdragon 821
(LeEco Le Pro3)
Kirin 960
(Mate 9)
SGEMM 13.4 GFLOPS 11.0 GFLOPS 12.2 GFLOPS 10.5 GFLOPS
SFFT 4.02 GFLOPS 2.76 GFLOPS 3.26 GFLOPS 2.88 GFLOPS
N-Body Physics 924.5 Kpairs/s 844.5 Kpairs/s 1183.3 Kpairs/s 832.6 Kpairs/s
Rigid Body Physics 6234.9 FPS 5941.6 FPS 7169.6 FPS 5879.2 FPS
Ray Tracing 203.7 Kpixels/s 220.6 Kpixels/s 297.7 Kpixels/s 221.8 Kpixels/s
HDR 9.49 Mpixels/s 8.13 Mpixels/s 11.3 Mpixels/s 8.10 Mpixels/s
Gaussian Blur 27.2 Mpixels/s 22.2 Mpixels/s 48.0 Mpixels/s 23.7 Mpixels/s
Speech Recognition 15.3 Words/s 13.2 Words/s 11.5 Words/s 12.8 Words/s
Face Detection 583.4 Ksubs/s 512.4 Ksubs/s 681.4 Ksubs/s 497.1 Ksubs/s

While integer IPC is essentially the same between E8895’s M2 CPU and S835’s Kryo 280, the E8895’s floating-point IPC is notably higher, surpassing the S835 in every test except Ray Tracing. Its advantage over the S835 is particularly pronounced in the SFFT (46%), SGEMM (22%), and Gaussian Blur (23%) tests. The E8895 even manages to outperform the S820/S821 at times, which still has the best overall floating-point IPC of current SoCs, taking the lead in the SGEMM, SFFT, and Speech Recognition workloads.

Geekbench 4 (Single Threaded)  Floating Point Score/GHz

After taking the geometric mean of the Geekbench 4 floating-point subtest scores and dividing by CPU frequency, the E8895 holds a 17% IPC advantage over the S835. The S820/S821’s IPC is still higher than both the E8895 and S835 by 11% and 31%, respectively. The S835’s Kryo 280 CPU delivers the same floating-point performance in these workloads as the A72 and A73, with all 3 CPUs showing a minimal 3-5% advantage over the older A57. Altogether, the past 3 generations of big CPU cores only show a 35% difference in floating-point IPC in Geekbench 4.

Memory Performance

The Geekbench memory tests show some performance differences between the Galaxy S8’s two SoCs. The E8895 performs significantly better than the S835 in the Memory Copy test that uses the memcpy() routine with randomized offsets; however, the S835 delivers higher bandwidth when streaming to system memory along with a 9% lower latency figure.

Geekbench 4 - Memory Performance
Single Threaded
  Exynos 8895
(Galaxy S8)
Snapdragon 835
(Galaxy S8)
Snapdragon 821
(LeEco Le Pro3)
Kirin 960
(Mate 9)
Memory Copy 6.74 GB/s 4.32 GB/s 8.05 GB/s 4.60 GB/s
Memory Latency 147.5 ns 134.4 ns 150.2 ns 140.4 ns
Memory Bandwidth 14.95 GB/s 16.87 GB/s 13.93 GB/s 17.23 GB/s

It’s always interesting to look further back in time to see how performance has improved over several generations. In this case, the memory bandwidth test shows the biggest gains when comparing the Galaxy S8 to the Galaxy S5 (S801) and Galaxy S6 (E7420), where the older SoCs can only muster 6.95GB/s and 7.51GB/s, respectively, compared to at least 14.95GB/s for the S8 (E8895). The gains in the Memory Copy test are not as dramatic, with the Snapdragon version of the S8 (4.32GB/s) showing a small improvement over the S5 (3.98GB/s) and S6 (3.35GB/s). The transition from LPDDR3 to higher-frequency LPDDR4 DRAM along the way certainly helped boost performance, as did improvements in CPU microarchitecture (the AGUs in particular).

The S835’s Kryo 280 CPU comes with twice the L1 cache as the E8895’s M2 CPU, 64KB versus 32KB, respectively. The Kryo 280’s L1 latency remains steady at 1.28ns, the same as the A73 core in the Kirin 960, which is about 26% better than the M2’s 1.74ns latency figure. The S835 extends this same latency advantage to the L2 cache as well. The S835’s latency advantage over the E8895 shrinks to 9%, nearly matching the Geekbench 4 result, when accessing main memory at the upper limit of our own internal test.

Overall, the E8895’s M2 CPU core holds a small IPC advantage over the S835’s Kryo 280 core in floating-point workloads, but the two cores are pretty evenly matched when working with integers. There are a few specific workloads where each microarchitecture shines, but the theoretical performance difference between the Galaxy S8’s two SoC choices is not as large as expected.

Introduction System Performance
POST A COMMENT

137 Comments

View All Comments

  • goatfajitas - Friday, July 28, 2017 - link

    /edit - buy what suits you Reply
  • zodiacfml - Friday, July 28, 2017 - link

    It is not that big. The "taller" aspect ratio exaggerates the diagonal. To the article, the 10nm SoC now seems more valuable than benchmarks/reviews I've seen from other sites. Since the Pixel is going to be expensive, taller, no storage expansion and without a headphone jack, I have no ideal phone yet this year. The Mi Mix 2 or the LG V30 might. Reply
  • philehidiot - Saturday, July 29, 2017 - link

    Just as a side point, I went from a HTC M9 to an S8. I tried and tested the S8 and S8+. Bear in mind I have small hands to the point where I also pack a pair of socks to compensate. If you're American or not quite so crude that means I prefer a 9mm to a .45. I found the elongated screen of the S8 to be just about tolerable and the advantages for multitasking do outweigh the occasional situation where I need to reach the far end of the screen and can't do it. I suspect most people with normal hands will find the S8 to be perfectly fine from a usability standpoint. Certainly the S8+ I would strongly recommend you try a live model before you buy and perhaps consider waiting for the new Note if big screens are your bag.

    As for the carrying something that big you haven't heard the worst of it. It's well built - teardowns show this. Equally it's still made of glass for crying out loud. You NEED a case (and what's the point in making something so aesthetically amazing when you have to cover it??!!) and not a light one either. I have a leather fold out case which allows me to watch stuff on the phone at an angle and also takes some cards. Interestingly, it has two magnets right next to where the cards live. I got locked out of my hotel room due to this. Regardless, the necessary beef and size of case required to protect such a fragile device means the size is doubled. If HTC had continued down the metal line I'd have gone with them but it's all about bloody glass these days and I'm sick of it.
    Reply
  • Tttimothy2355 - Friday, July 28, 2017 - link

    Apple stocks galaxy awesome Reply
  • syxbit - Friday, July 28, 2017 - link

    >>"Our initial look at Snapdragon 835 revealed that its Kryo 280 performance cores are loosely based on ARM’s Cortex-A73 while the efficiency cores are loosely based on the Cortex-A53"

    Why would you write such a blatant lie. It's not LOOSELY based at all. It's >95% the same chip. QCOM have made minor tweaks just to be able to market it as their own design.
    Reply
  • tipoo - Friday, July 28, 2017 - link

    Where would the A10 fall on the ratio/GHz chart I wonder?

    http://images.anandtech.com/graphs/graph11540/sams...
    Reply
  • name99 - Friday, July 28, 2017 - link

    We can guess.
    You can see the A9 results here:
    http://www.anandtech.com/show/9686/the-apple-iphon...

    Eyeballing it, they are on average about 1.5x the current A73 results.
    A10 results are about 50% faster again, while running at the same frequency as the CPUs referenced in the article, so basically about twice the IPC of the current A73 crop of champions.

    One thing that stands out in comparing the SPEC results across all these devices is the massive jump in 175.vpr. A9 (which, like I said, is at around 1.5x for most results) has a value of 2017. This is about in line with what we see for Snapdragon 821. Then we get these massively (2x larger than I'd expect) scores for the other high-end ARM cores.

    My guess is that something changed in the compiler in the past year or so. (Since the article doesn't say whether gcc or llvm was used, I can't investigate further.) My guess is likewise that this wasn't something nefarious, some "cheat" to make SPEC results look better --- no-one cares about SPEC2000 on ARM64 anyway --- but rather some general improvement in the compiler (perhaps loop unrolling/data placement, but most likely autovectorization) that managed to MASSIVELY improve ARM64 performance on this particular piece of code.
    Presumably (if the change is in LLVM...) Apple picks up the same improvement, but sadly we never got to see the A10 SPEC results. Maybe A11?

    So summary
    - Apple's IPC seems to now be at around 2x ARM competitors for most purposes. (It's at around 1.25x Intel's; but to be fair Intel can clock higher; but to be fair Intel uses more juice)
    - something interesting happened to 175.vpr on ARM64 in the past year or so, and if anyone knows, they should speak up!
    Reply
  • Nullify - Saturday, July 29, 2017 - link

    I was hoping for Anand to do a deep dive on the A10. Perhaps they're saving it for the A11? Should be the first ARM core in the world to break 4,000 single core on Geekbench, making it a full 2X faster than the 8895 or 835. It's truly amazing how much further ahead Apple is. Reply
  • tuxRoller - Saturday, July 29, 2017 - link

    How big are those some cores, again?
    It's not like this is magic, and these companies know his to make very high IPC if you don't care about cost. Apple has built a massive core, and they pay the price in silicon.
    ARM, and most of their licensees, are optimizing for silicon area efficiency, not absolute performance.

    http://cdn.wccftech.com/wp-content/uploads/2016/10...
    Reply
  • Meteor2 - Saturday, July 29, 2017 - link

    I think that, as alluded to above, Android and iOS are diverging so much that there's little point comparing Apple IPC to ARM or whoever, you may as well compare it to Power or SPARC. Reply

Log in

Don't have an account? Sign up now