AMD’s Manju Hegde is one of the rare folks I get to interact with who has an extensive background working at both AMD and NVIDIA. He was one of the co-founders and CEO of Ageia, a company that originally tried to bring higher quality physics simulation to desktop PCs in the mid-2000s. In 2008, NVIDIA acquired Ageia and Manju went along, becoming NVIDIA’s VP of CUDA Technical Marketing. The CUDA fit was a natural one for Manju as he spent the previous three years working on non-graphics workloads for highly parallel processors. Two years later, Manju made his way to AMD to continue his vision for heterogeneous compute work on GPUs. His current role is as the Corporate VP of Heterogeneous Applications and Developer Solutions at AMD.

Given what we know about the new AMD and its goal of building a Heterogeneous Systems Architecture (HSA), Manju’s position is quite important. For those of you who don’t remember back to AMD’s 2012 Financial Analyst Day, the formalized AMD strategy is to exploit its GPU advantages on the APU front in as many markets as possible. AMD has a significant GPU performance advantage compared to Intel, but in order to capitalize on that it needs developer support for heterogeneous compute. A major struggle everyone in the GPGPU space faced was enabling applications that took advantage of the incredible horsepower these processors offered. With AMD’s strategy closely married to doing more (but not all, hence the heterogeneous prefix) compute on the GPU, it needs to succeed where others have failed.

The hardware strategy is clear: don’t just build discrete CPUs and GPUs, but instead transition to APUs. This is nothing new as both AMD and Intel were headed in this direction for years. Where AMD sets itself apart is that it is will to dedicate more transistors to the GPU than Intel. The CPU and GPU are treated almost as equal class citizens on AMD APUs, at least when it comes to die area.

The software strategy is what AMD is working on now. AMD’s Fusion12 Developer Summit (AFDS), in its second year, is where developers can go to learn more about AMD’s heterogeneous compute platform and strategy. Why would a developer attend? AMD argues that the speedups offered by heterogeneous compute can be substantial enough that they could enable new features, usage models or experiences that wouldn’t otherwise be possible. In other words, taking advantage of heterogeneous compute can enable differentiation for a developer.

That brings us to today. In advance of this year’s AFDS, Manju has agreed to directly answer your questions about heterogeneous compute, where the industry is headed and anything else AMD will be covering at AFDS. Manju has a BS in Electrical Engineering (IIT, Bombay) and a PhD in Computer Information and Control Engineering (UMich, Ann Arbor) so make the questions as tough as you can. He'll be answering them on May 21st so keep the submissions coming.

Comments Locked


View All Comments

  • BenchPress - Saturday, May 19, 2012 - link

    No, the CPU doesn't have to be extra large. AVX2 barely increases the die size. ALUs are pretty tiny these days.

    And it doesn't matter what "they" want. AMD doesn't write all the applications, other companies do. It's a substantial investment to adopt HSA, which has to pay off in increased revenue. AVX2 is much easier to adopt and will be more widely supported. So from a ROI and risk perspective AVX2 wins hands down.

    Also, Trinity's GPU still loses against Intel's CPUs at things like video transcoding (which it was supposed to be good at). And that's before AVX2 is even supported! So how is AMD going to be able to make the GPU catch up with CPUs supporting AVX2? Actually they have to substantially *exceed* it for HSA to make sense. How much die space will that cost? How power efficient will it be at that point?

    And it gets worse. AVX2 won't be the last challenge HSA will face. There are rumors about AVX-1024 which would execute 1024-bit instructions (already mentioned by Intel in 2010) over four cycles, reducing the power consumption. So this would erase yet another reason for adopting HSA.

    So you have to consider the possibility that AMD might be going in the wrong direction. Them "wanting" to cripple the CPU and beef up the iGPU isn't sufficient for developers to take the effort in supporting such an architecture, and doesn't cancel the competition's advances in CPU technology.

    They need a reality check and implement AVX2 sooner rather than later, or some miracle GPGPU technology we haven't heard of yet. So I'm very curious what Manju has to say on how they will overcome *all* the obstacles.
  • SleepyFE - Saturday, May 19, 2012 - link

    They are not going in the wrong direction. They are merging the CPU and the GPU, and since the GPU can't stan on it's own (and since their new GPU-s are more or less SIMD-s) the CPU will probably take over the GPU operations. That means that the CPU will have the GPU cores available (and the program will have to be written to make use of them), but AVX will require more ALU-s (unless these same ALU-s are in the GPU part of the chip, making this chat a non issue).

    BTW when i said extralarge i meant without otherwise not added parts.

    And "THEY" also stands for us consumers, because we want things thinner and faster (only possible if you make use of all available resources)
  • markstock - Wednesday, May 16, 2012 - link

    Mr. Hegde, I have two questions which I hope you will answer.

    1) To your knowledge, what are the major impediments preventing developers from thinking about this new hierarchy of computation and begin programming for heterogenous architectures?

    2) AMD clearly aims to fill a void for silicon with tightly-coupled CPU-like and GPU-like computational elements, but are they only targeting the consumer market, or will future hardware be designed to also appeal to HPC users?

    Thank you.

  • tspacie - Thursday, May 17, 2012 - link

    My question is, where does he see the market for these APUs? NVIDIA tried to get consumers interested in GPU compute and it largely flopped. As AT has shown, the main usage of GPU compute for consumers was video transcode and QuickSync does at least as good a job. There appears to be a heterogeneous compute market at the high end (Tesla / Quadro) and the very high end (Cray, Amazon cloud, etc.), but almost none in the consumer space which seems to be where the APUs are being targeted.
  • MySchizoBuddy - Friday, May 18, 2012 - link

    What are AMDs plans for supporting compiler directive based speed ups like OpenACC supported by PGI, Cray and Nvidia
  • MySchizoBuddy - Friday, May 18, 2012 - link

    When will APU allow both CPU and GPU to access the same memory address, without requiring any data movement for CPU to GPU compute.
  • BenchPress - Friday, May 18, 2012 - link

    AMD previously released a roadmap for HSA indicating that the CPU and GPU would share a unified and coherent memory space in 2013:

    This doesn't mean that it eliminates any data movement though. When the GPU reads memory that was previously written by the CPU, it still has to travel all the way from the CPU's cache into the GPU's cache. The only thing you get is that this process will be managed by the hardware, instead of having to do it explicitly in your code. The actual overhead is still there.

    The only way to truly eliminate additional data movement is by using a homogeneous high throughput architecture instead, which will become available in the 2013 timeframe as well.
  • SleepyFE - Saturday, May 19, 2012 - link

    Not sure but i think the GPU has access to L2 cache (or they plan on doing that).
  • c0d1f1ed - Friday, May 18, 2012 - link

    Hi Manju,

    I would be very grateful if you could answer these questions:

    In case the GPU becomes swamped by the graphics workload and some CPU cores are un(der)utilized, will it be possible to preempt a general purpose task (say, physics) and migrate it from the GPU to the CPU? If so, to what extent would the developer be responsible for load balancing? Will the operating system be able to schedule GPU kernels the same way as CPU threads, or would the HSA software framework relief both the application developers and operating system developers of this complex task? How will you insure that the overhead of migrating the work will be lower than what you gain?

    Are GPU context switches only considered to be a QoS feature (i.e. insuring that one application can't hog the GPU and make the system unresponsive), or will it also be a viable way to achieve TLP by creating more kernels than the GPU supports concurrently, and regularly switching between them? In other words, how will the context switch overhead compare to that of a CPU in terms of wall time? If suspending and resuming kernels is not recommended, what other synchronization primitives will developers have access to for structuring tasks and dependencies (something analogous to CPU fibers perhaps)? Will the GPU support hardware transactional memory one day?

    It's clear that by making the GPU much more suitable for general purpose computing, some compromises will be made for graphics performance. At the same time, CPUs are gaining high throughput vector processing technology while also focusing on performance/Watt. How does AMD envision balancing (1) scalar execution on the CPU, (2) parallel execution on the CPU, (3) general-purpose execution on the GPU, and (4) graphics execution on the GPU? How do you expect it to evolve in the long term given semiconductor trends and the evolution to platform independent higher level programming?

    AMD has indicated to open up HSA to the competition and the software community, obviously in the interest of making it become the dominant GPGPU technology. How will fragmentation of features be avoided? OpenCL and C++ AMP still have some way to go so we'll have many versions, which prevents creating a single coherent ecosystem in which GPGPU code can easily be shared or sold. How will HSA ensure both backward and forward compatibility, even across vendors? To what extent will hardware details be shared with developers? Or will there always be 'black box' components under tight control of AMD, preventing both taking advantage of all hardware characteristics and fixing issues by third parties?

    Where do you expect integrated and discrete GPUs are heading? Integrated GPUs clearly benefit from lower latency, but in many cases lack in computing power (while also doing graphics) to outperform the CPU, while discrete GPUs can be severely held back by the PCIe bandwidth and latency. Will AMD provide powerful APUs aimed at hardcore gamers, or will all future CPUs include an IGP, sacrificing cores?

    Thank you,
  • chiddy - Friday, May 18, 2012 - link

    One area in which Intel has excelled is ensuring that it is easy for developers to use their technologies, and that the potential of their hardware is well utilized on all platforms.

    On the x86 front for example the Intel Windows and Linux x86 C/C++ and Fortran compilers and profilers are considered to be some of the best available, and Intel are very active in all major x86 open source platforms ensuring that their hardware works (e.g Intel contributed heavily to the port of KVM to Illumos so VT-x/VT-d worked from the beginning).

    This is a similar story on the GPU compute front with Nvidia who provide both tools and code to developers; and Intel will surely follow as indicated with the launch of their new 'Intel® SDK for OpenCL* Applications 2012' and associated marketing.

    What steps will AMD take to ensure firstly that developers have the tools and materials they need to develop on AMD platforms, and more importantly to reassure both developers and purchasers alike that their hardware can run x86 and GPU compute workloads on all platforms with acceptable performance and is thus a solid investment?

Log in

Don't have an account? Sign up now