Investigating NVIDIA's Jetson AGX: A Look at Xavier and Its Carmel Cores
by Andrei Frumusanu on January 4, 2019 11:00 AM EST- Posted in
- NVIDIA
- SoCs
- Xavier
- Automotive
- Jetson
- Jetson AGX
Today’s piece is a bit of an unusual review; NVIDIA’s new Jetson AGX embedded system kit isn’t really a device platform we’re expecting the average reader to think about, much less buy. NVIDIA’s shift over the last few years from offering consumer grade Tegra chipsets to more specialised silicon applications isn’t any more evident than in the new Tegra Xavier which powers the Jetson AGX. While the board's capabilities certainly fall outside of the use-cases of most consumers, it still represents a very interesting platform with a lot of functionality and silicon IP that we don’t find in any other device to this day. So when NVIDIA reached out to offer us a sample, we decided to have a go at assembling a high-level overview of what the board and the new Xavier chip can do.
First of all, we have to describe what this actually is. The Jetson AGX is a full-fledged small form-factor computer / embedded system, with the form of the whole unit not any bigger than 105x105mm. The AGX module itself is designed to be a complete commercial off the shelf (COTS) system for use in finished products, with NVIDIA aiming it at AI (read: neural networking) centric use cases such as robotics and industrial automation. Jetson boards typically occupy the small-to-mid volume end of the market, showing up in one-off products and items with limited production runs, where it doesn't make sense for a manufacturer to develop and deploy their own custom hardware.
But of course the bare module is only half the story. You can't do development against a bare module, and this is where NVIDIA's complete Jetson AGX development kit comes in. The AGX dev kit comes with everything needed to run a single module, including a power supply, a heatsink, and more important of all, a breakout board. The breakout board offers various I/O headers and ports, ranging from your standard double USB-C 3.1 ports, HDMI connectors and Gigabit Ethernet ports, to more specialised connectivity such as MIPI CSI-2 connectors for camera connectivity and a range of typical development board headers such as a 40 pin GPIO connector.
The more unusual connectivity options of the Jetson AGX are the PCIe Gen4 x16 slot as well as a M.2 PCIe x1 extension slot that is meant to be used for connectivity add-ons such as WiFi or cellular modules, both features that aren’t common among Arm development board as most SoCs don’t have the spare PCIe controllers.
The board comes with many other connectors, and that’s one regard in which the new Jetson AGX doesn’t lack at all in flexibility. Power is supplied by an external generic 19V power supply – the stock one supplied by NVIDIA is a 65W LiteOn unit that seems no different than most laptop charger bricks.
Underneath the quite heavy and solid aluminium heatsink we find what actually powers the Jetson AGX board: the AGX Xavier module. This is a system module that sits on top of the Jetson motherboard – the module has no I/O ports by itself and merely serves as the brains of the system, integrating the core components surrounding the Xavier chip, such as the 16GB of LPDDR4x memory, a small 32GB eMMC storage chip as well as all the power delivery circuits for powering the different power rails of the DRAM as well as IP blocks of the Xavier SoC.
The Xavier chip, as said, is the brains of the platform and represents NVIDIA’s biggest and most complex SoC to date. With 9 billion transistors on a die size of 350mm², it’s among one of the heavyweights of the Arm ecosystem, although between its initial announcements and today Apple has one-upped NVIDIA in terms of transistor count as the new A12X is a 10B chip – in a much smaller manufacturing node.
Coming from the traditional PC industry, NVIDIA doesn’t shy away from showing die shots of their products, which is something that is quite rare these days among the Arm SoC vendors. The Xavier SoC is mainly dominated by two big IP blocks which consist of the majority of the space allocated on the die: The 8-core “Carmel” CPU complex as well as a four-cluster Volta GPU.
At the high level, the CPU complex contains 8 Carmel CPU cores configured in four clusters, each with a pair of Carmel CPU cores. Each cluster has an independent clock plane and shares a 2MB cache among two CPU cores. At the higher CPU complex level we find a 4MB L3 cache serving all clusters. We don’t know too much about the microarchitecture of the new Carmel cores - seemingly this looks to be a successor to NVIDIA’s Denver µarch, a design that was characterised by its dynamic code optimisation capability. The only thing that NVIDIA does advertise is that this is a 10-wide superscalar machine (10 execution ports in this case, not 10-wide decode) and has support for the ARMv8.2+RAS instruction set. We’ll come back to the CPU core later in the article.
The GPU in Xavier has its roots in the Volta architecture. Here we find the GPU configured into four TPC (Texture processing clusters), each with two SMs (Stream multiprocessors), for a total of 8 SMs or 512 ALU lanes/CUDA cores. A most interesting aspect of the GPU is that because it is based on Volta, it also inherits the Tensor processing units from its bigger brethren. This augments the total processing power of the GPU by up to 22.6 8-bit TOPs or 11.3 FP16 TOPS on the part of the Tensor cores, on top of the respectively 2.8 and 1.4 TFLOPs for FP16 and FP32 CUDA operations provided by the SMs.
Alongside the CPU and GPU there’s many other important blocks, many of which NVIDIA had covered already at its HotChips 2018 presentation last summer. The one block that really does augment the Xavier SoC is the new DLA IP block: this is very much a new type of block that follows the trend we’ve seen in the mobile SoC space – a dedicated machine inferencing acceleration unit not unlike that which we’ve seen from the likes of HiSilicon or Apple. NVIDIA’s DLA promises performances of up to 11.4 int8 TOPS and is also capable of FP16 operation at half speed at 5.7 TOPS. On the SoC, the unit is implemented as a dual-core instance.
Alongside the DLA, the programmable vision accelerator is again a key component of the Xavier system that allows it focus on vision and in particular robotics, embedded AI and automotive use-cases. The PVA is a more traditional vision IP block that handles more rudimentary tasks such as object detections in a much more efficient way than it would be able to be handled by the GPU or machine inferencing algorithms. Here the PVA will be the first IP block after the ISP in the vision pipeline that will serve to segment parts of an image into objects that will be then forwarded to other algorithms that then would happen on the GPU or DLA.
51 Comments
View All Comments
CheapSushi - Friday, January 4, 2019 - link
This is very minor but I'm surprised the ports/connectors aren't more secure on something meant to be in a car. I would expect cables to be screwed in like classic DVI or twist locked in or some other implementation. I feel like the vibration of the car, or even a minor accident, could loosen the cables. Or maybe I got the wrong impression from the kit.KateH - Friday, January 4, 2019 - link
afaik the generic breakout boards included in dev kits are just for the "dev" part- development and one-offs. a final design would probably use a custom breakout board with just the interfaces needed and in a more rugged form factor thats integrated into the product.mode_13h - Friday, January 4, 2019 - link
Would've loved to see a Denver2 (Tegra TX2) in that comparison. According to this, they're actually faster than Carmel:https://openbenchmarking.org/result/1809258-RA-180...
Note that the benchmark results named "TX2-6cores-enabled-gcc-5.4.0" refer to the fact that TX2 had the Denver2 cores disabled by default! Out of the box, it just ran everything on the quad-A57 cluster.
edatech - Saturday, January 5, 2019 - link
Same results also says TX2 is running with higher frequency (TX2 @ 2.04GHz while Jetson Xavier @ 1.19GHz), so not quite an apple to apple comparison.mode_13h - Saturday, January 5, 2019 - link
I'm not sure how much to read into that number. Would they really run the A57 and Denver2 cores at the same frequency? Is the Xavier figure really the boost, and not just the base clock?There's also this (newer) result:
https://openbenchmarking.org/result/1812170-SK-180...
Again, my point is that I wish the article had looked at Denver2. It sounds like an interesting, if mysterious core.
Jetson TX2 boards are still available - and at much lower prices than Xavier. So, it's still a worthwhile and relevant question how it compares - especially for those not needing Xavier's Volta and Tensor cores.
LinuxDevice - Monday, January 7, 2019 - link
It isn't so much that the cores are "disabled" (which to me would be something not intended to be turned on) as it is offering multiple power consumption profiles. The whole Jetson market started with the intent to offer it as an OEM reference board, but the reference boards were rather good all by themselves and ended up being a new market. The TX2 Denver cores are simple to turn off or on...but default is off.Xavier has something similar with the "nvpmodel" tool for switching around various profiles. To see full performance you need to first run "sudo nvpmodel -m 0", and the max out the clocks with the "~nvidia/jetson_clocks.sh" script.
SanX - Saturday, January 5, 2019 - link
Change the publisher asap. The most stupid and insulting ads you will find only at AT. Smells dirt and cheap. Yuck...I don't have such bad impression from YouTube for example, talk to Google guys.
TheJian - Sunday, January 6, 2019 - link
Double the gpu side at 7nm and throw it in an 100-250w box the size of an xbox/ps and I'm in for a new game console. Was hoping they'd re-enter mobile space with Intel/Qcom/samsung modem at 10 or 7nm since they can be included easily without the same watt issues before. NV doesn't need their own modem today (please come back, mobile gaming is getting great!). We need NV gpus in mobile :)Also, I refuse to buy old tech in your android tv system. Upgrade the soc, or no sale. COMPETE with msft/sony dang it! It's already a great streamer, but you need the gaming side UP and it needs to be a 150w+ box today or just another streamer (sonly msft are going 250w+ in their next versions probably) or why not just buy a $35-50 roku? Sure you can turn off most of it while streaming (or playing bluray), but power needs to be there for the gaming side. The soc is the only thing holding me back from AndroidTV box from NV for years now. I wanted 2 socs in it when it first launched, then they shrunk it and gave no more power. You're turning me off NV, you should be turning me ON...LOL. I have no desire for another msft/sony console, but I'd buy a HIGH WATT android model. None of this 15-25w crap is worth it. Roku take note too, as in add a gaming soc (call NV!) and gamepad support or no more sales to anyone in our family (we're going HTPC, because streamers suck as anything but streaming). We need multi-function at this point or you don't make it to our living room. HTPC fits everything I guess (thus we're building 3...LOL). Streaming, gaming, ripping, well, heck, EVERYTHING in one box with mass storage inside too. ShieldTV units will sell a LOT better (roku too) if you get better gaming in them. Angry birds alone doesn't count Roku!
A 7nm Tegra without all the crap for cars, etc, would be VERY potent. You have the money to make a great gaming box today. Move it into mobile (a single soc one of course) if the tech takes off by adding a modem. Either way, ShieldTV needs an soc upgrade ASAP. Not looking for RTX type stuff here, just a great general android gaming machine that streams. You have to start here to make a gaming PC on ARM stuff at some point. Use cheap machines to make the bigger ones once entrenched. Make sure it can take a discrete NV card at some point as an upgrade (see what I did there, selling more gpu cards, with no wintel needed). At some point it turns into a full PC :)
That said, I can’t wait for my first car that will drive me around while drinking ;) Designated drivers for all Oh and, our tests are completely invalidated by testing a 12nm vs. 10 & 7nm (and outputting with Ethernet hooked up), but but but….Look at our dumb benchmarks. Note also, cars want a MUCH longer cycle than pc’s or worse, mobile devices. These people don’t upgrade their soc yearly (more like 5-7 tops). So a box you plop in with most of the software done, is great for many car models. We are talking ~81-90mil sold yearly globally (depending on who you believe). Even 10mil of those at $100 a box would be a great add to your bottom line and I’m guessing they get far more than that, but you have to make a point at some price here ;) We are talking 1B even if it’s just $100 Net INCOME per box. That would move NV’s stock price for sure. Something tells me it’s 30%+ margins (I’d guess 50%+ really), but I could be wrong. Has anyone else done this job for less than $1500? Also note, as more countries raise incomes, more cars will be sold yearly.
https://www.statista.com/statistics/200002/interna...
Just as you see here, and the world still needs more cars (heck roads in some places still needed…LOL). Growth. There is room for more than one player clearly for years. Until L5 becomes a commodity there is good money to be had by multiple companies in this space IMHO. Oh and 35mil of those are cars are EU/USA (17.5ea for both). Again, much growth to come as more places get roads/cars, and how many of them have driverless so far? Not many.
At $1500 or under anyone can add this on to a car, as that is cheaper than the $7500 subsidy they have to add to an electric car just to even JOKE about making a dime on them right? And this would NOT be a subsidy. Electric cars are for the rich or stupid. I don’t remember voting for $7500 per car giveaways to make green people happy either! Please KILL THIS ASAP TRUMP! That is 1.5B per car maker (200K cars can be subsidized by each maker). I want a freaking WALL NOW not renewable subsidy crap for products that can’t make money on their own and I am UN-interested in completely as long as gas is available cheaper overall! Screw 5B, tell them $25B or the govt shuts down completely (still a joke, most stays open anyway) for your next 2yrs. Let them pound sand in discretionary spending. :) Only NON-ESSENTIAL people even go home. Well heck, why do I need a NON-essential employee anyway in govt? Let private sector take on all their crap, or just leave it state to state, where they are much better able to handle problems they are versed in.
“The one aspect which we can’t quantize NVIDIA’s Carmel cores is its features: This is a shipping CPU with ASIL-C functional safety features that we have in our hands today. The only competition in this regard would be Arm’s new Cortex A76AE, which we won’t see in silicon for at least another year or more.”
“the Carmel cores don’t position themselves too well.”
Er, uh, would you be saying that at 7nm vs. 7nm?? I’m guessing NV could amp the speeds a bit if they simply took the EXACT core and 7nm’d it right (a new verb?)? Can’t see a way forward? Nobody will have its safety features for a year in the segment it targets DIRECTLY, but you can’t see a way forward?...LOL. Never pass up a chance for an AMD portal site to knock NV. Pause for a sec, while I test it with my 2006 tests that well, aren’t even the target market…Jeez. Possibly make sense to go IN-HOUSE? So you’re saying on the one hand that there was NO OTHER CHOICE for a YEAR, but it’s only POSSIBLY a good idea they went in house? I think you mean, it was ONLY POSSIBLE to go in-house, and thus a BRILLIANT decision to go IN HOUSE, and I can see how this chip really goes FORWARD. There, fixed it. Intel keeps offering GPU designs, and they keep failing correct (adopting AMD tech even)? You go in house until someone beats you at your own game, just ask apple. No reason to give a middle man money unless he is soundly beating you, or you are not making profit as is.
So it’s really good at what it was designed to do, and is a plop in component for cars for ~$1000-1500 with software done pretty much for most? But NV has challenges going forward making money on it…LOL. Last I checked NV has most of the car market sewn up (er, signed up? Pays to be early in many things). Cars are kind of like Cuda. It took ~7yrs before that really took off, but look at it now. Owning everything else, and OpenCL isn’t even on the playing field as AMD can’t afford to FORCE it onto the field alone.
“But for companies looking to setup more complex systems requiring heavy vision processing, or actually deploying the AGX module in autonomous applications (no spellchecker before hitting the website?) for robotics or industrial uses, then Xavier looks quite interesting and is definitely a more approachable and open platform than what tends to exist from competing products.”
Translation: When you use it as it was designed, nobody has a competing offering…LOL. You could have just put the last P as the whole article and forgot the rest. Pencils work great as a writing tool, but when we try to run games on them, well, they kind of suck. I’m shocked. Pencils can’t run crysis? WTH?? I want my money back…LOL. Don’t the rest of the guys have the challenge, of trying to be MORE OPEN and APPROACHABLE? Your article is backwards. You have to dethrone the king, not the other way around. Where will NV be in a year when the competition finally gets something right? How entrenched will they be by then? Cars won’t switch on a dime like I will for my next vid card/cpu…LOL. They started this affair mid 2015 or so, and it will pay off 2021+ as everyone wants a autonomous cars on the road by then.
https://www.thestreet.com/investing/stocks/nvidia-...
https://finance.yahoo.com/news/nvidia-soars-ai-mar...
“we believe that the company is well poised to grow in the driverless vehicle technology space”
Arm makes under 500m (under 400 actually), NV makes how much (9x-10x this?)? Good luck. I do not believe off the shelf will beat a chip designed for auto, so someone will have to CUSTOM their way to victory over NV here IMHO.
https://www.forbes.com/sites/moorinsights/2018/09/...
BMW chose Intel, Tesla switches (and crashes, ½ a million sold so far?? Who cares), but I wonder for how long. I guess it depends on how much work they both want to do, or just plop in Nvidia solutions. I’ll also venture to guess Tesla did it merely to NOT be the same as Volvo, Toyota etc who went with NV. Can’t really claim your different using what everyone else uses. MOOR Insights isn’t wrong much. They have covered L2-L4 and even have built the chip to handle L5 (2 socs in Pegasus). How much further forward do you need to go? It seems they’re set for a bit, though I’m sure they won’t sit idle while everyone else catches up (they don’t have a history of that). TL:DR? It's a sunday morning, I had time and can type 60wpm...LOL.
gteichrow - Sunday, January 6, 2019 - link
FWIW and I know this has been discussed internally at your fine operation (no sarc): But pay option? I'd pay $1-$2/mo to be ad-free. I fully realize it's a PITA to manage that model. I already do this on Medium (barely, barely, barely worth it) and Patreon for others. The time is right, me thinks. Let's pick the Winners from the Losers and be done with it. You folks are in the winning camp, IMO.It almost goes without saying, but, you'all do a great job and thanks for all the work you folks do!
gteichrow - Sunday, January 6, 2019 - link
Sorry, but meant this to be in the comments below under the discussion about ads that had started. Oops. But thoughts still apply. Cheers.