Apple S1 Analysis

One of the biggest issues with the smartwatch trend that I’ve seen is that as a result of most companies entering the market with smartphone backgrounds, we tend to see a lot of OEMs trying to shove smartphone parts into a smartwatch form factor. There have been a lot of different Android Wear watches, but for the most part everything seems to use Qualcomm’s Snapdragon 400 without the modem. Even though A7 is relatively low power for a smartphone, it’s probably closer to the edge of what is acceptable in terms of TDP for a smartwatch. Given that pretty much every Android Wear watch has around a 400 mAh battery at a 3.8 or 3.85 volt chemistry to attempt to reach 1-2 days of battery life and a relatively large PCB, the end result is that these smartwatches are really just too big for a significant segment of the market. In order to make a smartwatch that can scale down to sizes small enough to cover most of the market, it’s necessary to make an SoC specifically targeted at the smartwatch form factor.


Capped Apple S1 SoC (Image Courtesy iFixit)

The real question here is what Apple has done. As alluded to in the introduction, it turns out the answer is quite a bit. However, this SoC is basically a complete mystery. There’s really not much in the way of proper benchmarking tools or anything that can be run on the Watch to dig deeper here. Based on teardowns, this SoC is fabricated on Samsung’s 28nm LP process, although it’s not clear which flavor of LP is used. It’s pretty easy to eliminate the high power processes, so it’s really just a toss-up between HKMG and poly SiON gate structure. For those that are unfamiliar with what these terms mean, the main difference that results from this choice is a difference in power efficiency, as an HKMG process has less leakage power. Given how little cost is involved in this difference in process compared to a move to 20/14nm processes, it’s probably a safe bet that Apple is using an HKMG process here especially when we look at how the move from 28LP to 28HPm at TSMC dramatically affected battery life in the case of SoCs like Snapdragon 600 and 800.


Decapped & Labeled S1 SoC (Image Courtesy ABI Research)

We also know that binaries compiled for the watch target ARMv7k. Unfortunately, this is effectively an undocumented ISA. We know that Watch OS is built on iOS/Darwin, so this means that a memory management unit (MMU) is necessary in order to make it possible to have memory protection and key abstractions like virtual memory. This rules out MCU ISAs like ARMv7m even if it's possible to add an MMU to such an architecture, so it’s likely that we’re looking at some derivative of ARMv7-A, possibly with some unnecessary instructions stripped out to try and improve power consumption.

The GPU isn’t nearly as much of a mystery here. Given that the PowerVR drivers present in the Apple Watch, it’s fairly conclusive that the S1 uses some kind of PowerVR Series 5 GPU. However which Series 5 GPU is up to debate. There are reasons to believe it may be a PowerVR SGX543MP1, however I suspect that it is in fact PowerVR's GX5300, a specialized wearables GPU from the same family as the SGX543 and would use a very similar driver. Most likely, dedicated competitive intelligence firms (e.g. Chipworks) know the answer, though it's admittedly also the kind of information we expect they would hold on to in order to sell it to clients as part of their day-to-day business activities.

In any case, given that native applications won’t arrive until WatchOS 2 is released I don’t think we’ll be able to really do much in the way of extensive digging on what’s going on here as I suspect that graphics benchmarks will be rare even with the launch of WatchOS 2.

Meanwhile, after a lot of work and even more research, we're finally able to start shining a light on the CPU architecture in this first iteration of Apple's latest device. One of the first things we can start to look at is the memory hierarchy, which is information crucial to applications that require optimization to ensure that code has enough spatial and/or temporal locality to ensure that code is performant.

As one can see, there’s a pretty dramatic fall-off that happens between 28 and 64KB of “DRAM”, as we exit the local maximum of L1 data cache, so we can safely bet that the L1 data cache size is 32KB given current shipping products tend to fall somewhere between 32 and 64KB of L1 data cache. Given the dramatic fall-off that begins to happen around 224KB, we can also safely bet that we’re looking at a 256KB L2 combined cache which is fairly small compared to the 1-2MB shared cache that we might be used to from today’s large smartphone CPUs, but compared to something like an A5 or A7 it’s about right.

If Apple had just implemented the Cortex A7 as their CPU of choice, the obvious question at this point is whether they’ve really made anything “original” here. To try and dive deeper here, we can start looking past the memory hierarchy and looking closer at the machine itself. One of the first things that is obvious is that we’re looking at a CPU with a maximum frequency of 520 MHz, which is telling of the kind of maximum power that Apple is targeting here.

Apple S1 CPU Latency and Throughput
Instruction Throughput (Cycles/Result) Latency (Cycles/Result)
Loads (ldr reg,[reg]) 1 N/A
Stores (str reg,[reg]) 1 N/A
Move (mov reg, reg) 1/2 -
Integer Add (add reg, reg, imm8) 1/2 -
Integer Add (add reg,reg,reg) 1 1
Integer Multiply (mul reg,reg,reg) 1 3
Bitwise Shift (lsl reg,reg) 1 2
Float Add (vadd.f32 reg,reg,reg) 1 4
Double Add (vadd.f64 reg,reg,reg) 1 4
Float Multiply (vmul.f32 reg,reg,reg) 1 4
Double Multiply (vmul.f64 reg,reg,reg) 4 7
Double Divide (vdiv.f64 reg,reg,reg) 29 32

Obviously, talking about the cache hierarchy isn’t enough, so let’s get into the actual architecture. On the integer side of things, integer add latency is a single cycle, but integer multiplication latency is three cycles. However, due to pipelining integer multiplication throughput can produce a result every clock cycle. Similarly, bitshifts take two cycles to complete, but the throughput can be once per clock. Attempting to interleave multiplies and adds results in only achieving half the throughput. We can guess that this is because the integer add block and the integer multiply block are the same block, but that doesn’t really make sense because of just how different addition and multiplication are at the logic level.

Integers are just half of the equation when it comes to data types. We may have Booleans, characters, strings, and varying bit sizes of integers, but when we need to represent decimal values we have to use floating point to enable a whole host of applications. In the case of low power CPUs like this one, floating point will also often be far slower than integers because the rules involved in doing floating point math is complex. At any rate, a float (32-bit) can be added with a throughput of one result per cycle, and a latency of four cycles. The same is true of adding a double or multiplying a float. However, multiplying or dividing doubles is definitely not a good idea here because peak throughput of multiplying doubles is one result per four clock cycles, with a latency of 7 clock cycles. Dividing doubles has a peak throughput of a result every 29 clock cycles, with a latency of 32 clock cycles.

If you happen to have a webpage open with the latency and throughput timings for Cortex A7, you’d probably guess that this is a Cortex A7, and you’d probably be right as well. Attempting to do a load and a store together has a timing that indicates these are XOR operations which cannot be executed in a parallel manner. The same is true of multiplication and addition even though the two operations shouldn’t have any shared logic. Conveniently, the Cortex A7 has a two-wide pipeline that has similar limitations. Cortex A5 is purely single-issue, so despite some similarity it can't explain why addition with an immediate/constant value and a register can happen twice per clock.

Given the overwhelming amount of evidence at the timing level of all these instructions, it’s almost guaranteed that we’re looking at a single core Cortex A7 or a derivative of it at 520 MHz. Even if this is just a Cortex A7, targeting a far lower maximum clock speed means that logic design can prioritize power efficiency over performance. Standard cells can favor techniques and styles that would otherwise unacceptably compromise performance in a 2+ GHz chip could be easily used in a 520 MHz chip such as device stacking, sleepy stack layout, higher Vt selection with negative active body biasing, and other techniques that would allow for either lower voltage at the same frequency, or reduced capacitance in dynamic power and reduced static leakage. Given that Cortex A7 has generally been a winning design for perf/W metrics, I suspect that key points of differentiation will come from implementation rather than architecture for the near future. Although I was hoping to see Apple Watch on a more leading-edge process like 14LPP/16FF+, I suspect this will be deferred until Apple Watch 2 or 3.

Design WatchOS: Time and Notifications
Comments Locked

270 Comments

View All Comments

  • name99 - Monday, July 20, 2015 - link

    Regarding ARMv7k, check out the following story:
    http://arstechnica.com/apple/2011/09/support-for-q...

    Note the date --- Sept 2011. Further evidence that Apple plans these things a LONG time in advance,
    (Relative to which, it is interesting to note that over the past month there has been a flurry of activity by Apple people on working LLVM targeting M-class processors. Maybe Apple are planning more IoT peripherals in a few years, or maybe they want to stick a small MPU in every Beats headset for some reason?
    Or maybe they are moving from whatever they use today for PMU and sensor fusion on iOS/Watch to an M-class core?)
  • hlovatt - Monday, July 20, 2015 - link

    First, thanks for a great review. Excellent to have such detail.

    I don't wear a watch so won't be getting one. However I know 4 owners who are all very happy. They all previously owned smart watches, Garmin, Pebble, Fitbiz, etc. and universally prefer the Apple Watch. The tap thing sounds like a gimmick, but just try it - it's really well done.

    Gripe: If you hate Apple so much that you can't be rational just leave Anandtech. There are plenty of places were you can have a mutual we hate Apple session. You are spoiling the site for others who want to discuss tech. If you prefer some other product just buy it, don't sling insults at others that disagree with you. Get real the reviewers said they wouldn't recommend the 1st gen device and you go off saying they have sold out etc. Totally unfair to them.
  • name99 - Monday, July 20, 2015 - link

    While investigating the CPU details in interesting (and thanks!!! for doing this) I think it's important to appreciate that the CPU is probably the least important thing about aWatch performance as it matters to the average person.

    There are IMHO three primary performance problems with aWatch today:
    (a) There is far too little caching (in a very generic sense) so that third party apps (and some interactions with Apple apps) require communicating with iPhone. Much of this will disappear with WatchOS2; some of it may be an inevitable fact of life regarding how BT LE works and, in particular, the minimum possible latency when one side wants to talk to the other. But it's also possible that this latency could be reduced in future versions of BT by changing the rendezvous algorithm?

    (b) The touch screen controller (I assume to save power) only seems to take initial sensor reading at around twice a second. The result is that the first time you touch the screen to scroll, there is an obvious halting until the system sort of "gets it" and starts smoothly scrolling. This is obviously a touch screen issue because using the digital crown (when that is feasible, so for vertical rather than horizontal scrolling) acts immediately and smoothly. The fix, presumably, is to ramp up the rate at which the touch screen controller does its initial sensing, but who knows what the power implications of that are.

    (c) The heart rate sensor is on "full-time" (which means, I don't know, sensing once every 10 seconds?) when you are in the Workout app, but otherwise runs at a really low rate (once every ten minutes?) At least the way I use my aWatch, I'd prefer a higher rate.

    I'm guessing that Apple was overly cautious about battery life in WatchOS1, and now that most people understand what to expect, and have about 40% battery at the end of the day, they can afford to bump up the sampling rates for all these different things (touch screen, heart rate, maybe even BT LE) and if that moves the battery life down to 20% battery at the end of the day, that's a pretty good tradeoff.

    But nowhere in any of this is CPU performance actually an issue. I can't think of anywhere where CPU or GPU performance affect the experience.
  • name99 - Monday, July 20, 2015 - link

    A few comments about fitness:
    The primary thing using the Workout app does, as far as I can tell, is switch to ongoing (rather than coarse) monitoring of heart rate and position, which is useful but not essential. However it DOES also give you a nice display of whatever you consider important. My Pebble used to kinda sorta track steps and thus calories, but the fact that the watch tracks and displays heart rate on the Workout app screen is actually really useful. With the Pebble I'd kinda slack off when doing a run or step climbing being that's only natural, but when your heart rate is displayed you have more incentive to keep pushing.

    The workout app is also nice if you're trying to hit your calorie burn goal every day. If you get to say 10pm or so and are 150 calories short, you can set a calorie goal (rather than say a time goal or a distance goal) and then just start stepping while watching TV or whatever.

    Two useful facts to know (which I don;t think you mention). You can launch Workout (or any app) through "Hey Siri launch workout" rather than navigating to the app screen. (It's also useful to know that Hey Siri as a way to start speech ONLY works when the screen is lit up. If you don't know this, it's maddening at first as half the time it seems to work and half the time it doesn't.
    Also double-clicking the digital crown toggles between the most recent app and the watch face. I use it a lot to toggle between watch and workout.

    Finally most readers are probably young and think the stand up stuff is dumb or pointless. It really isn't, at least for older people. I've got to the stage where, when I stand up I can feel a kind of stiffness in the muscles, you know that old person sigh when you get up. And I've found that since getting the watch and heeding the stand notices, that has pretty much disappeared --- it really does help older muscles to not get locked into no motion for two or more hours.
    (Also if you find the standing irritating, it's worth noting that the watch wants you to stand for a full minute, with some motion. At first I just used to stand then pretty much immediately sit down. That's not good enough and it won;t give you credit for that. But if you stand and pace for a minute or so, it will always give a little ding and reward you with credit for the stand.)

    The one thing I wish (there is so much we can all want them to add to WatchOS2 and then WatchOS3) is a data broadcast mechanism. In particular, if the workout data could be displayed simultaneously on a phone (placed near a TV or on a step machine control panel) that would be much more comfortable than having to flick the wrist every minute or so to check one's heartbeat. Oh well, in time...
  • navysandsquid - Monday, July 20, 2015 - link

    I've never seen such a butthurt bunch of people. Almost every review on the internet give the apple watch a favorable review. Maybe you guys should stop letting you hate consume you. Anandtech has done one of the most indepth review of the apple watch. Which they do with most products. If you don't like apple products don't read the reviews. Your pathetic for even comments such ignorant things like "Watch under 9 days of battery life is unacceptable" Like please name a watch device with this much capability that runs longer then a couple days. oh wait you cant. I've had the watch for about 2 months and this review is spot on weather you like apple or not. Anandtech is a good review site. So just look in the mirror and say "Why do I hate them so much" Let me answer that for you. You don't like paying for what you get. Wait let me rephrase that. you do like paying for what you get your just to cheap to pay for quality. so p1ss of and buy yours self a Samsung smart watch for 149$ and let it collect dust lol I'm done
  • name99 - Monday, July 20, 2015 - link

    "Glances are well-executed and a useful feature, but I don’t really get the point of integrating heart rate monitoring into a glance or similar cases of app information"

    The authors appear unaware that you can customize glances. Go to the Watch app on iPhone and look around. You can both hide glances you find unimportant, and rearrange those that you want to use. Once you've done this, you can basically prioritize so that the most important stuff is in complications, while second tier stuff lives in glances, and third tier stuff requires an app launch.
  • name99 - Monday, July 20, 2015 - link

    "Moving on to the saturation test, we can see that Apple has put a huge amount of effort into calibrating these displays, which is somewhat surprising given that one might expect wearables to not be all that critical when it comes to color accuracy."

    A persistent (and STILL not fixed) problem with the Apple ecosystem is that the faces of contacts display slightly differently on OSX vs iOS. There are outright bugs in the system (.psd photoshop files get incorrectly cropped on iOS, and different gamma is applied on OSX and iOS) but these may be fixed with the new Contacts framework of iOS9/OSX 10.11.
    Point is --- your eye is actually remarkable sensitive to these apparently very slight deviations, at least when it comes to faces. So it makes sense for Apple to line up their hardware so that when they (at LONG FREAKING LAST!) get their software act together, the face photos do look identical across the line.

    (And BTW how long will we have to keep typing in triples like iOS10/OSX 10.12/WatchOS3? At some point, and I think we're reaching that point, it's time to just refer to AppleOS 2015 followed by AppleOS 2016 followed by ...)
  • aryonoco - Monday, July 20, 2015 - link

    I am not an Apple hater, and I am very curious in the Apple Watch and the whole wearables category. However I agree with those who say that this review was below Anandtech's standards. Overly wordy, with too little information. I don't think I have ever said this about an Anandtech review before, but after reading this, I really don't think I learned a single thing that I didn't know going into the review.
  • whiteiphoneproblems - Monday, July 20, 2015 - link

    Without wanting to "pile on," I agree that this review could have been 1/3 the length, and 3x as helpful. I usually look to AT for the "best" review of any mobile device, but I would not say that is the case with this particular review. Most other Apple Watch reviews I've read have been more useful. (I think it comes down to editing.)
  • nrencoret - Monday, July 20, 2015 - link

    +1 on that. I think you nailed the fact that Anandtech's succes is after reading an article, you always come out at the end a bit (or a lot in some cases) smarter. This review breaks the trend.

Log in

Don't have an account? Sign up now