Applied Micro's X-Gene: The First ARMv8 SoCby Anand Lal Shimpi on November 14, 2011 1:44 PM EST
We covered the X-Gene announcement a couple of weeks ago when the news was first made public. I was in London at the time meeting with Nokia so I didn't get a chance to sit down with Applied Micro's engineers to discuss the SoC and its architecture. Thankfully, upon my return, they gave me the opportunity to do just that.
We've been hearing about ARM based servers for a while now, but their advantages have always been lower power consumption than beefy x86 servers for lighter workloads. You always sacrifice performance and memory addressibility. APM hopes to change that with its X-Gene.
Development on X-Gene began three years ago. APM was originally a PowerPC house. The company was working on a 64-bit PowerPC core internally before meeting with ARM and eventually redirecting its efforts to a 64-bit ARM core. Together with ARM, APM started laying the foundation for ARM's first 64-bit instruction set - now known as ARMv8.
At a time when everyone else was working on ARMv7 cores, this gave APM a headstart on the ARMv8 transition. As of now there is no officially announced, licensable ARMv8 core from ARM itself. I believe this makes the X-Gene the world's first ARMv8 SoC.
At a high level the X-Gene is pretty beefy. Each CPU core can fetch and decode up to four ARMv8 (or eight Thumb) instructions per clock. APM wouldn't reveal the depth of the pipeline, but it is targeting a 3GHz operating frequency at 28/40nm so it's safe to say that the pipeline is fairly deep. APM did add that it's not quite as deep as the Pentium 4, but rather in the sweet spot. I'd take that to mean we're looking at something around or just shy of 20 stages for the integer pipeline.
APM wouldn't go into detail on the back end configuration of the X-Gene, nor would it comment on other intracacies like branch predictors or cache configuration. We can learn a lot from the front end alone though. Cortex A15 features a 3-issue front end, and moving to 4 implies a generational gap in IPC. Note that we saw a similar transition going from the P6/NetBurst eras to Intel's Conroe (aka Core 2) architecture.
As the X-Gene implements the ARMv8 ISA it is a full 64-bit architecture that is backwards compatible with 32-bit ARMv7. The CPU features hardware virtualization acceleration, MMU virtualization, advanced SIMD instructions and what APM is calling a "very sophisticated" FPU, although once again details were scarce.
Despite the aggressive architecture, each core is estimated to consume only 2W per core. Like most mobile SoCs, the entire chip will idle at around 300mW.
At the SoC level, APM plans to integrate many of these CPU cores onto a single package. The range is officially 2 - 128 cores, although I expect we'll see something more reasonable than the extremes. The SoC also features integrated SATA (up to six 6Gbps ports per SoC) and two 10GbE controllers.
Each SoC can feature up to four 72-bit DDR3 (64-bit + ECC) memory controllers, although lower core count configurations will have fewer memory controllers.
You can plop multiple SoCs down on a single board, connected by a coherent interface that can deliver up to 400Gbps of bandwidth between chips.
APM's performance estimates put a 3GHz X-Gene at roughly half the integer performance of a 2.4GHz Sandy Bridge. The X-Gene advantage however is the ability to integrate many more cores. APM expects a quad-core X-Gene will be able to perform similarly to a dual-core Sandy Bridge Xeon, but with much lower power consumption.
Update: APM has since pulled the slide it shared with us originally making the comparison to Intel's Sandy Bridge architecture. The implication being that its performance estimates may have been a bit too aggressive, only time will tell...
These are all estimates today. The first customer evaluation boards will be available in March 2012. The X-Gene SoCs on the eval boards will be delivered as FPGAs. The ASIC version for actual deployment won't hit until the second half of next year. The first chips will be built on a 40nm process to get them to market quickly and cost effectively, but the design is expected to transition to 28nm afterwards. At 40nm we may not see such aggressive clocks or tons of cores per SoC.
APM expects that even with a late 2012 launch it will have a 1 - 2 year lead on the competition. If it can get the X-Gene out on time, hitting power and clock targets (both very difficult goals), the headstart will be tangible. Note that by the end of 2012 we'll only just begin to see the first Cortex A15 implementations. ARMv8 based competitors will likey be a full year out, at least.
There's also the question of whether or not enterprise customers want to move to an ARM based server platform. Unlike in the smartphone/tablet space, x86 is the incumbent in the server arena. Equal performance at lower power consumption is quite attractive, but there's still a lot of convincing that needs to be done. Not to mention that Intel does have the ability to build a competitive, Atom based solution.
More than anything it's good to see such strong competition at both the high end and low end of the microprocessor business. Threatening to disrupt the status quo in both is going to pave the way for progress in our industry.