Earlier this month we had the pleasure to attend Qualcomm’s Maui launch event of the new Snapdragon 865 and 765 mobile platforms. The new chipsets promise to bring a lot of new upgrades in terms of performance and features, and undoubtedly will be the silicon upon which the vast majority of 2020 flagship devices will base their designs on. We’ve covered the new improvements and changes of the new chipset in our dedicated launch article, so be sure to read that piece if you’re not yet familiar with the Snapdragon 865.

As has seemingly become a tradition with Qualcomm, following the launch event we’ve been given the opportunity to have some hands-on time with the company’s reference devices, and had the chance to run the phones through our benchmark suite. The QRD865 is a reference phone made by Qualcomm and integrates the new flagship chip. The device offers insight into what we should be expecting from commercial devices in 2020, and today’s piece particularly focuses on the performance improvements of the new generation.

A quick recap of the Snapdragon 865 if you haven’t read the more thorough examination of the changes:

Qualcomm Snapdragon Flagship SoCs 2019-2020
SoC

Snapdragon 865

Snapdragon 855
CPU 1x Cortex A77
@ 2.84GHz 1x512KB pL2

3x Cortex A77
@ 2.42GHz 3x256KB pL2

4x Cortex A55
@ 1.80GHz 4x128KB pL2

4MB sL3 @ ?MHz
1x Kryo 485 Gold (A76 derivative)
@ 2.84GHz 1x512KB pL2

3x Kryo 485 Gold (A76 derivative)
@ 2.42GHz 3x256KB pL2

4x Kryo 485 Silver (A55 derivative)
@ 1.80GHz 4x128KB pL2

2MB sL3 @ 1612MHz
GPU Adreno 650 @ 587 MHz

+25% perf
+50% ALUs
+50% pixel/clock
+0% texels/clock
Adreno 640 @ 585 MHz




 
DSP / NPU Hexagon 698

15 TOPS AI
(Total CPU+GPU+HVX+Tensor)
Hexagon 690

7 TOPS AI
(Total CPU+GPU+HVX+Tensor)
Memory
Controller
4x 16-bit CH

@ 2133MHz LPDDR4X / 33.4GB/s
or
@ 2750MHz LPDDR5  /  44.0GB/s

3MB system level cache
4x 16-bit CH

@ 1866MHz LPDDR4X 29.9GB/s



3MB system level cache
ISP/Camera Dual 14-bit Spectra 480 ISP

1x 200MP

64MP ZSL or 2x 25MP ZSL

4K video & 64MP burst capture
Dual 14-bit Spectra 380 ISP

1x 192MP

1x 48MP ZSL or 2x 22MP ZSL

 
Encode/
Decode
8K30 / 4K120 10-bit H.265

Dolby Vision, HDR10+, HDR10, HLG

720p960 infinite recording
4K60 10-bit H.265

HDR10, HDR10+, HLG

720p480
Integrated Modem none
(Paired with external X55 only)


(LTE Category 24/22)
DL = 2500 Mbps
7x20MHz CA, 1024-QAM
UL = 316 Mbps
3x20MHz CA, 256-QAM

(5G NR Sub-6 + mmWave)
DL = 7000 Mbps
UL = 3000 Mbps
Snapdragon X24 LTE
(Category 20)

DL = 2000Mbps
7x20MHz CA, 256-QAM, 4x4

UL = 316Mbps
3x20MHz CA, 256-QAM
Mfc. Process TSMC
7nm (N7P)
TSMC
7nm (N7)

The Snapdragon 865 is a successor to the Snapdragon 855 last year, and thus represents Qualcomm’s latest flagship chipset offering the newest IP and technologies. On the CPU side, Qualcomm has integrated Arm’s newest Cortex-A77 CPU cores, replacing the A76-based IP from last year. This year Qualcomm has decided against requesting any microarchitectural changes to the IP, so unlike the semi-custom Kryo 485 / A76-based CPUs which had some differing aspects to the design, the new A77 in the Snapdragon 865 represents the default IP configuration that Arm offers.

Clock frequencies and core cache configurations haven’t changed this year – there’s still a single “Prime” A77 CPU core with 512KB cache running at a higher 2.84GHz and three “Performance” or “Gold” cores with reduced 256KB caches at a lower 2.42GHz. The four little cores remain A55s, and also the same cache configuration as well as the 1.8GHz clock. The L3 cache of the CPU cluster has been doubled from 2 to 4MB. In general, Qualcomm’s advertised 25% performance uplift on the CPU side solely comes from the IPC increases of the new A77 cores.

The GPU this year features an updates Adreno 650 design which increases ALU and pixel rendering units by 50%. The end-result in terms of performance is a promised 25% upgrade – it’s likely that the company is running the new block at a lower frequency than what we’ve seen on the Snapdragon 855, although we won’t be able to confirm this until we have access to commercial devices early next year.

A big performance upgrade on the new chip is the quadrupling of the processing power of the new Tensor cores in the Hexagon 698. Qualcomm advertises 15 TOPS throughput for all computing blocks on the SoC and we estimate that the new Tensor cores roughly represent 10 TOPS out of that figure.

In general, the Snapdragon 865 promises to be a very versatile chip and comes with a lot of new improvements – particularly 5G connectivity and new camera capabilities are promised to be the key features of the new SoC. Today’s focus lies solely on the performance of the chip, so let’s move on to our first test results and analysis.

New Memory Controllers & LPDDR5: A Big Improvement

One of the larger changes in the SoC this generation was the integration of a new hybrid LPDDR5 and LPDDR4X memory controller. On the QRD865 device we’ve tested the chip was naturally equipped with the new LP5 standard. Qualcomm was actually downplaying the importance of LP5 itself: the new standard does bring higher memory speeds providing better bandwidth, however latency should be the same, and power efficiency benefits, while there, shouldn’t be overplayed. Nevertheless, Qualcomm did claim they focused more on improving their memory controllers, and this year we’re finally seeing the new chip address some of the weaknesses exhibited by the past two generations; memory latency.

We had criticised Qualcomm’s Snapdragon 845 and 855 for having quite bad memory latency – ever since the company had introduced their system level cache architecture to the designs, this aspect of the memory subsystem had seen some rather mediocre characteristics. There’s been a lot of arguments in regards to how much this actually affected performance, with Qualcomm themselves naturally downplaying the differences. Arm generally notes a 1% performance difference for each 5ns of latency to DRAM, if the differences are big, it can sum up to a noticeable difference.


 (   )

Looking at the new Snapdragon 865, the first thing that pops up when comparing the two latency charts is the doubled L3 cache of the new chip. It’s to be noted that it does look that there’s still some sort of logical partitioning going on and 512KB of the cache may be dedicated to the little cores, as random-access latencies start going up at 1.5MB for the S855 and 3.5MB for the S865.

Further down in the deeper memory regions, we’re seeing some very big changes in latency. Qualcomm has been able to shave off around 35ns in the full random-access test, and we’re estimating that the structural latency of the chip now falls in at ~109ns – a 20ns improvements over its predecessor. While it’s a very good improvements in itself, it’s still a slightly behind the designs of HiSilicon, Apple and Samsung. So, while Qualcomm still is the last of the bunch in regards to its memory subsystem, it’s no longer trailing behind by such a large margin. Keep in mind the results of the Kirin 990 here as we go into more detailed analysis of memory-intensive workloads in SPEC on the next page.

Furthermore, what’s very interesting about Qualcomm’s results in the DRAM region is the behaviour of the TLB+CLR Trash test. This test is always hitting the same cache-line within a page across different, forcing a cache line replacement. The oddity here is that the Snapdragon 865 here behaves very differently to the 855, with the results showcasing a separate “step” in the results between 4MB and ~32MB. This result is more of an artefact of the test only hitting a single cache line per page rather than the chip actually having some sort of 32MB hidden cache. My theory is that Qualcomm has done some sort of optimisation to the cache-line replacement policy at the memory controller level, and instead the test hitting DRAM, it’s actually residing at on the SLC cache. It’s a very interesting result and so far, it’s the first and only chipset to exhibit such behaviour. If it’s indeed the SLC, the latency would fall in at around 25-35ns, with the non-uniform latency likely being a result of the four cache slices dedicated to the four memory controllers.

Overall, it looks like Qualcomm has made rather big changes to the memory subsystem this year, and we’re looking forward to see the impact on performance.

CPU Performance & Efficiency: SPEC2006
POST A COMMENT

178 Comments

View All Comments

  • joms_us - Monday, December 16, 2019 - link

    Right, he even claimed a 2015 Apple A9 is faster than Skylake and Ryzen processors today. Only a complete !Diot will believe this claim. Reply
  • Quantumz0d - Monday, December 16, 2019 - link

    You should see AT forum. A thread has been dedicated to discuss this BS fanboyism and outcome was Apple won. Reply
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    x86 emulation on Arm has absolutely nothing to do with any topic discussed here or QC vs Apple performance. I'm sick and tired of your tirades here as nothing you say remains technical or on point to the matter.

    The experience I have, when dismissing any other aspects such as iOS's super slow animations, is that the iPhones are far ahead in performance of any Android device out there, which is very much what the benchmark depict.
    Reply
  • Quantumz0d - Monday, December 16, 2019 - link

    Did I mention anything from your article on QC vs x86 ? I was replying to a comment on "Revolutionary" performance of A series vs x86. And then you claimed it as nonsensical point of x86 on ARM.

    So "super slow animations" & "far ahead". What do you mean by that ? An iPhone X vs a 11 Pro will exhibit the launching speed, then loading speed differences same as 835 vs 855 which can be observed. Everything ApplePro guy did a massive video of iPhones across multiple A series iterations which is the ONLY way a user can see the performance improvement.

    But when Android vs iOS you are saying iPhone animation speeds are super slow yet the benches show much lead..So how is the user seeing the far ahead in performance out there when OP7 Pro vs iPhone 11 Pro Max, like iPhone is still faster as you claim but in reality user is seeing same ?
    Reply
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    Apparently I'm able say that because I'm able to differentiate between CPU performance, raw performance, and "platform performance".

    CPU performance is clear cut on where we're at and if you're still arguing this then I have no interest in discussing this.

    Raw performance is what I would call things that are not actually affected by the OS, web content *is* far faster on the latest iPhone than on Androids, that's a fact. Among this is actual real applications, when Civilization came to iOS the developers notably commented on the performance being essentially almost as good as desktop devices, the performance is equal to x86 laptops or better: https://www.anandtech.com/show/13661/the-2018-appl...

    And finally, the platform experience includes stuff like the very slow animations. I expect this is a big part as to what you regard as being part of your "experience" and "reality". I even complained about this in the iPhone 11 review as I stated that I feel the hardware is being held back by the software here.

    Now here's what might blow your mind: I can both state that Apple's CPUs are far superior at the same time as stating that the Android experience might be faster, because both statements are very much correct.
    Reply
  • Quantumz0d - Monday, December 16, 2019 - link

    Okay thanks for that clarity on Raw performance and other breakdowns like CPU, Platform. Yes I can also see that Web performance on A series has always been faster vs Androids.

    I forgot about that article. Good read, and on Civ 6 port however it lacks the GFX options. I would also mention that TFlops cannot be even compared within same company. Like Vega 64 is 12TFs vs a 5700XT at 9TFs, latter completely wrecks the former in majority except for the compute loads utlizing HBM. I know you mentioned the FP16 and other aspects of the figure in opening, just saying as many people just take that aspect. Esp the new Xbox SX and Console as a whole (They add the CPU too into that figure)

    And finally. Yes ARM scales in normal browsing, small tasks vs x86 laptops which majority of the people nowadays are doing (colleagues don't even use PCs) but for higher performance and other workloads ARM cannot cut it at all.

    Plus I'd also add these x86 laptop parts throttle a lot incl. Macbooks obv because they are skimping on cooling them for thinness so their consistency isn't there as well just like A series.
    Reply
  • joms_us - Monday, December 16, 2019 - link

    When I look at the comparisons here, I look only for Android vs. Android or Apple vs. Apple. Comparing them with different OSes and more so primitive tools is a worthless approach. Firstly, the results need to be normalized, one Soc is showing lead while sucking more power than the other. Secondly, the bloated scores of Apple Soc here does not represent real-world results. Most Android phones with SD855 are faster if not the same than iPhone 11. Reply
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    > Comparing them with different OSes and more so primitive tools is a worthless approach.

    SPEC is a native apples-to-apples comparison. The web benchmarks and the 3D benchmarks are apples-to-apples interpreted or abstracted, same-workload comparisons.
    All the tests here are directly comparable - the tests which aren't and which rely on OS specific APIs, such as PCMark, obviously don't have the Apple data.

    > Firstly, the results need to be normalized, one Soc is showing lead while sucking more power than the other.

    That's a very stupid rationale. If you were to follow that logic you'd have to normalise little cores up in performance as well because they suck much less power.
    Reply
  • joms_us - Monday, December 16, 2019 - link

    > SPEC is a native apples-to-apples comparison.

    Stop right there, Apple vs. Apple only

    > The web benchmarks and the 3D benchmarks are apples-to-apples interpreted or abstracted, same-workload comparisons.
    All the tests here are directly comparable - the tests which aren't and which rely on OS specific APIs, such as PCMark, obviously don't have the Apple data.

    How? Just like Geekbench, different compilers are used. Different distribution of loads are made.
    My Ryzen 2700 can finished 5 full GB run as fast as one full GB run in an iPhone and yet the single core score of iPhone is higher than any Ryzen. You are showing Apple A13 (LOL A13 is faster than the fastest AMD or Intel chip) using Jurassic Spec benchmark?

    Talk about dreams vs. reality.

    > That's a very stupid rationale. If you were to follow that logic you'd have to normalise little cores up in performance as well because they suck much less power.

    We are talking about efficiency here, your beloved Apple chip is sucking twice the power than SD855 or SD865 per workload.

    Have you ever load a consumer website or run an consumer app with these phones side-by-side? Don't tell they are not using cpu or memory resources. They are, they are doing most if not all of the workloads on the charts here. While your chart if showing Apple has twice the performance vs SD865, the phone doesn't tell lies. A bloated benchmark score does not translate to real-world result.

    It is time to stop this worthless propaganda that Android SoC is inferior than Apple and the laughable IPC king (iPhone chip is faster than desktop processors).

    Until iPhone can play Crysis smoother than even low end laptops, this BS claim that it is the fastest chip should stop.
    Reply
  • Quantumz0d - Monday, December 16, 2019 - link

    Agreed.

    It really feels like a propaganda every single article on CPU Apple gets super limelight because of these benches on a closed walled garden platform from OS to HW to Repair.

    The power consumption of A series processors deteriorating the battery was nicely thrown under the rug by Apple throttling bs. They even added the latest throttle switch for XS series. But yea no one cares. Apple's deeppockets allow top lawyers in their hands to manipulate every thing.

    The consumer app part. Its perfect use case since we never see any of the Android phones lag as interpreted here due to the dominance of A series by 2-3x folds and in real life nothing is observable. And comparing that to the x86 Desktop machines with proper OS and a computing usecases like Blender, Vray, MATLAB, Compliation, MIPS of Compression and decompression, Decode/Encoding and superior Filesystem support and socketed / Standardized HW (PCIe, I/O options), Virtualization and Gaming, DRAM scaling choice (user can buy whatever memory they want or any HW as its obvious)..this whole thing screams bs. It would be better if the highlight is mentioned on benches and realwork might differ but its not the case at all.

    The worst is spineless corporate agenda of allowing Chinese CPC to harvest every bit from their Cloud data Center in China allowing the subversion and anti liberty. A.k.a Anti American principles.
    Reply

Log in

Don't have an account? Sign up now