AMD this morning is launching a new dedicated media accelerator and video encode card for data centers – and the first to be released under the AMD brand – the Alveo MA35D. The card is a successor to an earlier line of Xilinx cards that AMD picked up as part of their Xilinx acquisition, vaulting them into the market for dedicated video encode cards. The latest generation Alveo media accelerator card, in turn, promises significant performance benefits over its predecessor, quadrupling the maximum number of simultaneous video streams while also adding AV1 and 8K resolution encode support.

Like its predecessor, the Alveo U30, the MA35D is a pure video encode card designed for data centers. That is to say that its ASICs are designed solely for real-time/interactive video encoding, with Xilinx looking to do one thing and do it very well. This design strategy is in notable contrast to competing products from Intel (GPU Flex Series) and NVIDIA (T4 & L4), which are GPU-based products and leverage the flexibility of their GPUs along with their integrated video encoders in order to function as video encode cards, gaming cards, or other roles assigned to them. The MA35D, by comparison, is a relatively straightforward product that is designed to more optimally and efficiently do video encoding by focusing on just that.

As this is a product line inherited by AMD as part of their Xilinx acquisition and developed by the resulting Adaptive & Imbedded Computing Group, the Alveo MA35D is both new for AMD and familiar at the same. Previous data center video encode products released by AMD were based on their GPU lineup, so while this is the latest such video encode card for the ex-Xilinx team, this is the first time AMD proper has launched a dedicated video encode card in this fashion – and making it a prime example of the kind of new market opportunities AMD was looking for in acquiring Xilinx.

The target market for the card is, like its predecessor, the data center market. AMD’s principle clients are live streaming services and other interactive video services (think Twitch, cloud gaming, video conferencing, etc), all of whom need to encode large numbers of video streams in real-time in a server environment. So like AMD’s EPYC processors, this is very much a server part aimed at a select group of businesses.

Diving into the Alveo MA35D hardware itself, AMD is touting a significant generational upgrade over its predecessor. Whereas the Alveo U30 was an H.264 and H.265 encode card that could encode up to 8 1080p streams, the Alveo MA35D expands this substantially to 32 1080p streams. Meanwhile, support for the latest-generation AV1 codec has been added – joining the existing H.264 and H.265 options – and the maximum stream resolution has been increased from 4K to 8K – itself another quadrupling.

At the heart of the card is AMD’s unnamed video encode ASIC, which they are calling their Video Processing Unit (VPU). The MA35D contains two VPUs, each with their own 8GB pool of LPDDR5 memory and a PCIe 5.0 x4 connection back to the host processor. The VPU is being built on a 5nm process, through strangely AMD is not disclosing the fab being used, which makes us think it’s a Samsung 5nm process (ed: at this point, if someone is using TSMC, they’re usually bragging about it).

Under the hood, each VPU contains 4 video encode blocks, augmented with the various accessory blocks needed to make it a fully functional chip. Two of the encode blocks are full-featured, supporting H.264, H.265, and AV1, while the other two blocks are solely for AV1 – underscoring the additional computational complexity of the new codec. Other blocks on the VPU include video decoder blocks for transcoding, memory controllers, management controllers, a bitrate scaler, composition engines, and a 22 TOPS throughput AI processor to further improve the card’s video encode quality.

With the video encode blocks themselves, AMD’s engineers were quick to note that, despite the overlapping similarities between this part and AMD’s GPU efforts, the VPU’s video encode blocks are a unique design, and not pulled from AMD’s GPU video encode blocks. While I wouldn’t be surprised to see AMD eventually merge encoder IP across the product lines, for the current generation product the Alveo MA35D’s VPUs were in development before the Xilinx acquisition ever closed, so the former Xilinx team finished what they started. This means that the VPUs are bound to come with their own set of quirks, but also, there’s a certain degree of pride from the Alveo team that they’ve built the better video encoder.

The VPU also marks the transition of the Alveo video encoder family to a fully ASIC-based product. Xilinx, of course, is best known for their programmable FPGAs, and while the previous Alveo U30’s processors used hard logic for their video encode blocks, that was combined with a FPGA fabric network. So that product was still a mix of ASIC and FPGA design. MA35D’s VPUs, on the other hand, are tried and true ASICs with no FPGA elements, allowing the company to fully exploit the power efficiency benefits of using fixed function logic for a dedicated product.

And energy efficiency is the other major gain over the older U30 card – and what AMD considers a significant edge over their competition, as well. The formal TDP of the card is 50 Watts, but in practice AMD is finding that the typical power consumption of the card is closer to about 35 Watts, or a hair over 1W per stream for 1080p60. The U30, by comparison, had a formal TDP of 25 Watts, putting its worst-case power consumption at a bit over 3W per stream. AMD doesn't provide a similar "typical" power consumption figure for the U30, but at least under a maximum load, the UA35D should consume half as much energy per stream as its predecessor.

Meanwhile, new to the Alveo MA35D and its VPU is an AI acceleration block. Unlike GPU-based products, this isn’t for quasi-related AI tasks like image recognition; rather AMD is using the AI accelerator to feed additional data into their video encoder to further improve their encoding quality. Rated for 22 TOPS of performance, the AI processor exists to evaluate streams on a frame-by-frame basis, and then use that analysis to adjust the encode parameters used by the rest of the chip.

Using both region-of-interest encoding and artifact detection, the AI processor essentially allows the MA35D to get away with lower bitrates than a more naïve video encode strategy. Region-of-interest encoding allows for portions of a video to receive higher quality encoding (text, faces, etc), while artifact detection can catch when the encoder is being fed blocky or otherwise degraded images – which are actually harder to encode – and removing/correcting them before a frame is sent off for encoding.

All told, AMD is making some fairly aggressive image quality claims with the Alveo MA35D; H.264 and H.265 image quality should be similar to x264 Medium and x265 Medium presets respectively, while the card’s AV1 encoding quality should be comparable to x265 slow. These comparisons are based on VMAF scores, and what settings it takes to achieve similar scores. Or to frame things in a bitrate basis, using AV1 AMD says the MA35D can deliver the same image quality as the Alveo U30 in H.264 mode at 55% of the bitrate (a 1.8x efficiency improvement).

Finally, although secondary to the video encode capabilities of the MA35D, it’s interesting to note that the management processors in the VPU have shifted from Arm to RISC-V. Whereas the U30’s processors used quad core Cortex-A53 cores, the MA35D VPU uses a pair of quad core RISC-V cores – though AMD doesn’t specify whose. The RISC-V architecture has been quietly pushing out Arm for management controllers such as these, and this is another example of that transition in action.

With two VPUs, the complete Alveo MA35D card is still small enough that it comes in a single slot half-height half-length form factor. Meanwhile a 50W TDP means that the card is entirely powered via the PCIe slot, attached via a PCIe x8 connector (which gets bifurcated down to x4 for each VPU). And, as is typical for data center accelerator cards, the MA35D is passively cooled.

According to AMD, the Alveo is sampling to partners now. The company expects to begin production shipments in the third quarter of the year, with a suggested retail price of $1595.

Comments Locked

13 Comments

View All Comments

  • brucethemoose - Thursday, April 6, 2023 - link

    > the card’s AV1 encoding quality should be comparable to AV1 slow. These comparisons are based on VMAF scores

    Thats crazy. I saw a graph where Nvidia's realtime Ampere AV1 encoder was on par with x264 veryslow, which equates to a quick aomenc av1 preset. "Slow" AV1, on the other hand, is properly slow.

    The AI parameter control thing is super interesting, as that is undoubtedly where tons of bitrate is wasted in streaming. Even relatively primitive, non realtime tuning (like av1an's vmaf target testing + dark scene boost) has a huge effecting, and thats a long way from region-of-interest encoding.
  • Farfolomew - Thursday, April 6, 2023 - link

    Yes that AV1 slow quality surprised me too. Seems too good to be true. Maybe a typo?
  • Hifihedgehog - Thursday, April 6, 2023 - link

    It costs $1595 and is a dedicated video encoding ASIC, so I believe it.
  • brucethemoose - Thursday, April 6, 2023 - link

    Its possible, as AV1's default rate control isnt that sophisticated.
  • brucethemoose - Thursday, April 6, 2023 - link

    Also, VMAF is a great model, but I dont think it can, say, recognize that a face as a more important part of a frame to preserve.

    Hence AMD could actually be understating the quality of the output here.
  • Ryan Smith - Thursday, April 6, 2023 - link

    I've double checked, and that is indeed a typo. AMD is advertising the MA35D's video encode quality are comparable to x265 slow, not AV1 slow. Apologies for that.
  • brucethemoose - Friday, April 7, 2023 - link

    Thats makes more sense, but is still pretty good.
  • MrCommunistGen - Thursday, April 6, 2023 - link

    Wondering/hoping that maybe a cut down version of one of these encode blocks could be used to replace AMD's current VCN architecture. To save space, they could possibly leave out the dedicated AI blocks and utilize GPU compute for that instead.
  • brucethemoose - Thursday, April 6, 2023 - link

    Or slap an existing die onto a PCB and sell it as a "streamer edition." I dunno how its wired internally, but maybe they could give it enough bandwidth to skip the memory ICs and function as a chiplet hanging off the main GPU die.
  • ballsystemlord - Thursday, April 6, 2023 - link

    It only costs as much as an RTX4090! ;)

Log in

Don't have an account? Sign up now