Thermal Management on Stacked Silicon

With a standard processor design, there is a single piece of silicon doing all the work and generating the heat – it’s bonded to the package (which doesn’t do any work) and then depending on the implementation, there’s some adhesive to either a cooler or a headspreader then a cooler. When moving to a stacked chiplet design, it gets a bit more complicated.

Having two bits of silicon that ‘do work’, even if one is the heavy compute die and the other is an active interposer taking care of USB and audio and things, does mean that there’s a thermal gradient between the silicon, and depending on the bonding, potential for thermal hotspots and build-up. Lakefield makes it even more complex, by having an additional DRAM package placed on top but not directly bonded.

We can take each of these issues independently. For the case of die-on-die interaction, there is a lot of research going into this area. Discussions and development about fluidic channels between two hot silicon dies have been going on for a decade or longer in academia, and Intel has mentioned it a number of times, especially when relating to a potential solution of its new die-to-die stacking technology.

They key here is hot dies, with thermal hotspots. As with a standard silicon design, ideally it is best to keep two high-powered areas separate, as it gives a number of benefits with power delivery, cooling, and signal integrity. With a stacked die, it is best to not have hotspots directly on top of each other, for similar reasons. Despite Intel using its leading edge 10+ process node for the compute die, the base die is using 22FFL, which is Intel’s low power implementation of its 14nm process. Not only that, but the base die is only dealing with IO, such as USB and PCIe 3.0, which is essentially fixed bandwidth and energy costs. What we have here is a high-powered die on top of a low powered die, and as such thermal issues between the two silicon die, especially in a low TDP device like Lakefield (7W TDP), are not an issue.

What is an issue is how the compute die gets rid of the heat. On the bottom it can do convection by being bonded to more silicon, but the top is ultimately blocked by that DRAM die. As you can see in the image above, there’s a big air gap between the two.

As part of the Lakefield design, Intel had to add in a number of design changes in order to make the thermals work. A lot of work can be done with the silicon design itself, such as matching up hotspots in the right area, using suitable thickness of metals in various layers, and rearranging the floorplan to reduce localized power density. Ultimately both increasing the thermal mass and the potential dissipation becomes high priorities.

Lakefield CPUs have a sustained power limit of 7 watts – this is defined in the specifications. Intel also has another limit, known as the turbo power limit. At Intel’s Architecture Day, the company stated that the turbo power limit was 27 watts, however in the recent product briefing, we were told is set at 9.5 W. Historically Intel will let its OEM partners (Samsung, Lenovo, Microsoft) choose its own values for these based on how well the design implements its cooling – passive vs active and heatsink mass and things like this. Intel also has another factor of turbo time, essentially a measure of how long the turbo power can be sustained for.

When we initially asked Intel for this value, they refused to tell us, stating that it is proprietary information. After I asked again after a group call on the product, I got the same answer, despite the fact that I informed the Lakefield team that Intel has historically given this information out. Later on, I found out through my European peers that in a separate briefing, they gave the value of 28 seconds, to which Intel emailed me this several hours afterwards. This value can also be set by OEMs.

Then I subsequently found one of Intel’s ISSCC slides.

This slide shows that a basic implementation would only allow sustained power for 2.5 seconds. Adding in an adhesive between the top die and the DRAM moves up to 12.4 seconds, and then improving the system cooling goes up to 20 seconds. The rest of the improvements work below the compute die: a sizeable improvement comes from increasing the die-to-die metal density, and then an optimized power floor plan which in total gives sustained power support for 150+ seconds.

Lakefield: Top Die to Bottom Die Hybrid CPUs: Sunny Cove and Tremont
Comments Locked

221 Comments

View All Comments

  • AhsanX - Thursday, July 2, 2020 - link

    The Sunny Cove has AVX-512 but Tremont cores don,t have any AVX. So Intel disabled AVX on the Sunny Cove core too, as heat was gonna be a problem if they let it enabled.
  • ikjadoon - Thursday, July 2, 2020 - link

    Did we read the same article? AVX-512 was completely removed (i.e., physically) from the Sunny Cove dies because of Windows 10 compatibility problems.

    Windows was never built for x86 heterogeneous processing and still cannot do anything close now in 2020 (perhaps that would've been a smarter investment than going all-in on touch in 2008!).

    Intel & Microsoft remain stuck in late 2010s for their low-power / mobile-first / thin-client transition caused by the dominating success of smartphones & ARM-based architectures wiping out anything of interest in low-power x86.

    There's a reason Intel just nearly stopped all development on Atom: nobody give a crap about the Pentium Silver & Celeron CPUs.
  • ikjadoon - Thursday, July 2, 2020 - link

    *gave
  • Jorgp2 - Thursday, July 2, 2020 - link

    >There's a reason Intel just nearly stopped all development on Atom: nobody give a crap about the Pentium Silver & Celeron CPUs.

    Lol, no
  • ProDigit - Friday, July 3, 2020 - link

    The $160 laptop I purchased from hp, with an N5000 in it, works really well!
    Would have been better if used on a desktop, and the core count was quadrupled.
  • extide - Thursday, July 2, 2020 - link

    It's not a windows issue. Even if you ran Linux or any other OS on here you would have to run with CPU's all supporting the same ISA. ARM specifically designs cores to pair up together such that they have the exact same ISA (instruction support) so this isn't an issue in cell phones.

    I mean theoretically you could have the processors support a slightly different ISA and have it throw an interrupt if it tried to execute an instruction that the current core didn't support but a different one did and then the scheduler would have to move that thread to the other core. That could get really janky though, which is why nobody has talked about doing this yet.

    Also, they said in this article that even though Intel said they removed the AVX512 units -- it can still be seen in the die shots.

    Also, Intel didn't stop development on Atom -- this chip has a brand new core and their public roadmaps have several more in the future.
  • reggjoo1 - Tuesday, July 7, 2020 - link

    They’re gonna have to develop more for atom, and get into the right “governor “ for crossover operation, more than scheduler tweaks. They have a lot to learn, and it may come down to the quality over the I/O system for these to really succeed. As long as their "ego" doesn't get in their way, and they learn from the "smartphone arena, they might have something for X86.
  • dotjaz - Friday, July 3, 2020 - link

    Did we read the same article? AVX-512 was clearly not removed.
  • dotjaz - Friday, July 3, 2020 - link

    There, "Intel has stated on the record repeatedly that they removed it. The die shot of the compute silicon shows that not to be the case." If you can read.
  • jeremyshaw - Friday, July 3, 2020 - link

    That was a later edit. Originally Ian claimed it was removed.

Log in

Don't have an account? Sign up now