AMD’s Manju Hegde is one of the rare folks I get to interact with who has an extensive background working at both AMD and NVIDIA. He was one of the co-founders and CEO of Ageia, a company that originally tried to bring higher quality physics simulation to desktop PCs in the mid-2000s. In 2008, NVIDIA acquired Ageia and Manju went along, becoming NVIDIA’s VP of CUDA Technical Marketing. The CUDA fit was a natural one for Manju as he spent the previous three years working on non-graphics workloads for highly parallel processors. Two years later, Manju made his way to AMD to continue his vision for heterogeneous compute work on GPUs. His current role is as the Corporate VP of Heterogeneous Applications and Developer Solutions at AMD.

Given what we know about the new AMD and its goal of building a Heterogeneous Systems Architecture (HSA), Manju’s position is quite important. For those of you who don’t remember back to AMD’s 2012 Financial Analyst Day, the formalized AMD strategy is to exploit its GPU advantages on the APU front in as many markets as possible. AMD has a significant GPU performance advantage compared to Intel, but in order to capitalize on that it needs developer support for heterogeneous compute. A major struggle everyone in the GPGPU space faced was enabling applications that took advantage of the incredible horsepower these processors offered. With AMD’s strategy closely married to doing more (but not all, hence the heterogeneous prefix) compute on the GPU, it needs to succeed where others have failed.

The hardware strategy is clear: don’t just build discrete CPUs and GPUs, but instead transition to APUs. This is nothing new as both AMD and Intel were headed in this direction for years. Where AMD sets itself apart is that it is will to dedicate more transistors to the GPU than Intel. The CPU and GPU are treated almost as equal class citizens on AMD APUs, at least when it comes to die area.

The software strategy is what AMD is working on now. AMD’s Fusion12 Developer Summit (AFDS), in its second year, is where developers can go to learn more about AMD’s heterogeneous compute platform and strategy. Why would a developer attend? AMD argues that the speedups offered by heterogeneous compute can be substantial enough that they could enable new features, usage models or experiences that wouldn’t otherwise be possible. In other words, taking advantage of heterogeneous compute can enable differentiation for a developer.

That brings us to today. In advance of this year’s AFDS, Manju has agreed to directly answer your questions about heterogeneous compute, where the industry is headed and anything else AMD will be covering at AFDS. Manju has a BS in Electrical Engineering (IIT, Bombay) and a PhD in Computer Information and Control Engineering (UMich, Ann Arbor) so make the questions as tough as you can. He'll be answering them on May 21st so keep the submissions coming.

Comments Locked


View All Comments

  • jamyryals - Monday, May 14, 2012 - link

    While we already see some of these tasks being GPU accelerated, they are by and large experimental or extremely expensive to implement.

    My modification to your question would be, when will these tasks be easily accessible to perform these tasks to a non-specialized developer? Only when this happens will this technology become ubiquitous.
  • BenchPress - Monday, May 14, 2012 - link

    Heterogeneous computing will never be easy, which is one of the main reasons why homogeneous computing using AVX2 will prevail. It offers the same parallel computing advantages, without the disadvantages. Any programming language can use AVX2 to speed up vectorizable work, without the developer even having to know about it.

    Also, AVX2 will be supported by every Intel processor from Haswell forward, and AMD will have no other choice but to support it as well soon after. So few developers will be inclined to support a proprietary architecture that is harder to develop for.
  • Denithor - Monday, May 14, 2012 - link

    Well, the GPU is already going to be there so why not find some use for it? For gamers & workstations with discrete GPU the iGPU is just going to go waste otherwise...
  • BenchPress - Tuesday, May 15, 2012 - link

    Only a fraction of systems will have an IGP and a discrete GPU. Also, they'll come in widely varying configurations. This is a nightmare for developers.

    Things become much easier when the GPU concentrates on graphics alone (whether discrete or integrated), and all generic computing is handled by the CPU. NVIDIA has already realized that and backed away from GPGPU to focus more on graphics.

    AVX2 will be available in every Intel CPU from Haswell forward, and AMD will soon follow suit. And with a modest quad-core/module you'll be getting 500 GFLOPS of very flexible and highly efficient throughput computing power.

    I know it seems tempting that if you have three processors you give each of them a purpose, but it's just way too hard to ensure good performance on each configuration. Concentrating on AVX2 (and multi-threading with TSX) will simply give developers higher returns with less effort.
  • MJEvans - Monday, May 14, 2012 - link

    I have heard that there are ways of leveraging parallelism in more common programming and scripting languages; and there are also more implicitly parallel languages such as Erlang, but that might be too high level in some aspects for approaching the tasks that GPUs are best at.

    What sort of open platforms is AMD participating in, or even spearheading that would be useful for developers who are more familiar with more traditional/common languages?

    Are there any older languages you'd recommend developers experiment with for fun and educational purposes to help refine thought and use patterns?

    If the best performance might be had by combining existing language inspirations in to a new set of programming/scripting languages can you please link to more information about these new languages?

    Finally, hardware support under diverse operating systems: I know the answer is somewhat of a paradox; the impression (and reality) in gaming is that even when using the binary AMD driver under Linux there's less performance than windows users see on similar hardware, I worry that this might also extended in to higher end workstation and super-computing applications with the same hardware; from a marketing perspective would it not make sense to allocate slightly more resources for supporting the latest updates to the popular windowing systems in a timely manor, to support the latest hardware (with the binary driver) on the date of release (and ensure the community driver gets sufficient documentation and helpful hints to get feature parity sooner rather than later) and to make sure that benchmarks using OpenCL or whatever tools you expose the massively parallel processing tool to programmers with benchmark well under all operating systems?
  • codedivine - Monday, May 14, 2012 - link

    1. One of the big problems on Windows using GPU computing is Timeout Detection and Recovery. If a GPU is also driving the display, then that GPU is essentially limited to only small kernels, of say around 2 seconds of length. Will this get better in the future?
    Basically, will the GPU be able to seamlessly context-switch between computing apps, UI rendering and 3D apps etc in a seamless fashion?

    2. Will we see good performance fp64 support for more consumer GPUs? Will the GPU side of APUs ever get fp64?

    3. AMD's OpenCL implementation currently does not expose all the GPU capabilities. For example, no function pointers even though GCN hardware supports it (if I am understanding correctly).

    4. Will we see a new Firestream product? Also, why hasnt AMD pushed APUs in HPC more?
  • BenchPress - Monday, May 14, 2012 - link

    1. Not an issue with AVX2.

    2. You get great FP64 performance with AVX2.

    3. Any programming feature can still be supported when compiling for AVX2.
  • codedivine - Monday, May 14, 2012 - link

    I am well aware of AVX2. I didn't ask about that. GPUs, especially discrete GPUs, continue to hold massive advantage when it comes to floating point performance and AVX2 will not change that a whole lot. Also, as already pointed out, HSA is not about CPU vs GPU, but rather CPU+GPU so I am not sure why you keep comparing the two.

    Would be great if you could just focus on the thread.
  • BenchPress - Tuesday, May 15, 2012 - link

    GPUs will only have a 2x advantage in theoretical compute density over CPU cores with AVX2. I wouldn't call this a massive advantage, especially since the GPU suffers badly from having only a tiny amount of cache space per thread. Furthermore, discrete GPUs perform horribly due to the round-trip delay and limited CPU-GPU bandwidth.

    This really is about CPU vs. GPU, because CPU+GPU has no long term future and the GPU can't exist on its own. Hence the only possible outcome is that GPU technology will be merged into the CPU. Maybe we shouldn't call this a CPU any more, but it's definitely not a heterogeneous APU.
  • shawkie - Tuesday, May 15, 2012 - link

    What about memory bandwidth? The latest GPUs have 4GB of device memory running at 200GB/s.

Log in

Don't have an account? Sign up now