Exploring DirectX 12: 3DMark API Overhead Feature Test
by Ryan Smith & Ian Cutress on March 27, 2015 8:00 AM EST- Posted in
- GPUs
- Radeon
- Futuremark
- GeForce
- 3DMark
- DirectX 12
To say there’s a bit of excitement for DirectX 12 and other low-level APIs is probably an understatement. A big understatement. With DirectX 12 ramping up for a release later this year, Mantle 1.0 already in pseudo-release, and its successor Vulkan under active development, the world of graphics APIs is changing in a way not seen since the earliest days, when APIs such as Direct3D, OpenGL, and numerous vendor proprietary APIs were first released. From a consumer standpoint this change will still take a number of years, but from a development standpoint 2015 is going to be the year that everything changed for PC graphics programming.
So far much has been made about the benefits of these APIs, the potential performance improvements, and ultimately what can be done and what new things can be achieved with them. The true answer to those questions are that this is going to be a multi-generational effort; until games are built from the ground-up for these APIs, developers won’t be able to make full use of their capabilities. Even then, the coolest tricks will take some number of years to develop, as developers become better acquainted with these new APIs, their idiosyncrasies, and the capabilities of the underlying hardware when interfaced with these APIs. In other words, right now we’re just scratching the surface.
The first DirectX 12 games are expected towards the end of the year, and in the meantime Microsoft and their hardware partners have been ramping up the DirectX 12 ecosystem, hammering out the API implementation in Windows 10 while the hardware vendors write and debug their WDDM 2.0 drivers. Meanwhile as this has been going on, we’ve seen a slow release of software released designed to showcase DirectX 12 features in a proof of concept manner. A number of various internal demos exist, and we saw the first semi-public DirectX 12 software release last month with our look at Star Swarm.
This week the benchmarking gurus over at Futuremark are releasing their own first run at a DirectX 12 test with their latest update for the 3DMark benchmark. Futuremark has been working away at DirectX 12 for some time – in fact they were the first partner to show DirectX 12 code in action at Microsoft’s 2014 DX12 unveiling – and now they are releasing their first DirectX 12 project.
In keeping with the general theme of the demos we’ve seen so far, Futuremark’s new DirectX 12 release is another proof of concept test. Dubbed the 3DMark API Overhead Feature Test, this benchmark is a purely synthetic benchmark designed to showcase the draw call benefits of the new API even more strongly than earlier benchmarks. Whereas Star Swarm was a best-case scenario test within the confines of a realistic graphics workload, the API Overhead Feature Test is a proper synthetic benchmark that is designed to test one thing and one thing only: how many draw calls a system can handle. The end result, as we’ll see, showcases just how great the benefits of DirectX 12 are in this situation, allowing for an order of magnitude’s improvement, if not more.
To do this, Futuremark has written a relatively simple test that draws out a very simple scene with an ever-increasing number of objects in order to measure how many draw calls a system can handle before it becomes saturated. As expected for a synthetic test, the underlying rendering task is very simple – render an immense amount of building-like objections at both the top and bottom of the screen – and the bottleneck is in processing the draw calls. Generally speaking, under this test you should either be limited by the number of draw calls you can generate (CPU limited) or limited by the number of draw calls you can consume (GPU’s command processor limited), and not the GPU’s actual rendering capabilities. The end result is that the API Overhead Feature Test can push an even larger number of draw calls than Star Swarm could.
To showcase the difference between various APIs, this test is available with DirectX 12 and Mantle, but also two different DirectX 11 modes. Standard DirectX 11 single-threading is one mode, alongside support for DirectX 11 multi-threading. The latter has a checkered history – it never did work as well in the real world as initially hoped – and in practice only NVIDIA supports it to any decent degree. But regardless, as we’ll see DirectX 12’s throughput will put even DX11MT to shame.
FutureMark’s complete technical description is posted below:
The test is designed to make API overhead the performance bottleneck. The test scene contains a large number of geometries. Each geometry is a unique, procedurally-generated, indexed mesh containing 112 -127 triangles.
The geometries are drawn with a simple shader, without post processing. The draw call count is increased further by drawing a mirror image of the geometry to the sky and using a shadow map for directional light.
The scene is drawn to an internal render target before being scaled to the back buffer. There is no frustum or occlusion culling to ensure that the API draw call overhead is always greater than the application side overhead generated by the rendering engine.
Starting from a small number of draw calls per frame, the test increases the number of draw calls in steps every 20 frames, following the figures in the table below.
To reduce memory usage and loading time, the test is divided into two parts. The second part starts at 98304 draw calls per frame and runs only if the first part is completed at more than 30 frames per second.
Draw calls per frame Draw calls per frame increment per step Accumulated duration in frames 192 – 384 12 320 384 – 768 24 640 768 – 1536 48 960 1536 – 3072 96 1280 3072 – 6144 192 1600 6144 – 12288 384 1920 12288 – 24576 768 2240 24576 – 49152 1536 2560 49152 – 98304 3072 2880 98304 – 196608 6144 3200 196608 – 393216 12288 3520
113 Comments
View All Comments
Laststop311 - Saturday, March 28, 2015 - link
The benefit will come mainly for people using chips with many cores but poorer single threaded performance. That's basically quad core AMD APU users and 8 core FX chip and 6 core phenom II x6. Since users of those cpu's are the people most likely to be CPU bottlenecked due to dx11 only caring about single thread performance. Since intel chips have top tier single threaded performance they were not as restricted in dx11 and the gpu was usually the bottleneck to begin with so not much change there the gpu will still be shader bound.silverblue - Saturday, March 28, 2015 - link
I'm glad somebody mentioned the Phenom II X6. I'd be very interested to see how it copes, particularly against the 8350 and 6350.akamateau - Thursday, April 30, 2015 - link
AMD A6 APU has 4.4 million draw calls per second running DX12. Intel i7 4560 and GTX980 only has 2.2MILLION draw calls running DX11!!!!DX12 allows a $100 AMD APU by itself to outperform a $1500 Intel/nVidia gaming system running DX11.
That is with 4 CORES. Single core performance is not relevant any more.
All things being equal, DX12 will give AMD APU and Radeon dGPU a staggering performance advantage over Intel/nVidia.
FlushedBubblyJock - Tuesday, March 31, 2015 - link
What's the mystery ? It's Mantle for everyone - that's what DX12 essentially is.So just look at what mantle did.
Close enough.
StevoLincolnite - Friday, March 27, 2015 - link
The consoles are limited to 6 cores for gaming, not 8, those 6 cores are roughly equivalent to a Haswell Core i3 in terms of total performance (Or a high clocked Pentium Anniversary!).Remember, AMD's fastest high-end chips struggles to beat Intel's 4 year old mid-range... Take AMD's low-end low-powered chips and it's a laughable situation.
But that's to be expected, consoles cannot afford to have high-end components, they are cost sensitive low-end devices.
Lets just hope that Microsoft and Sony do not beat this horse for all it's worth and we get a successor out within the next 4ish years.
The Xbox One also uses a modified version of Direct X 11 for it's high level API.
The Xbox One also has a low-level API which developers can target and extract more performance.
Basically once Direct X 12 is finalized for the PC it will be modified and ported to the Xbox One, giving developers who do not buy an already made game-engine like Unreal, CryEngine etc' a performance boost without blowing out development time significantly by being forced to target the low level API.
The same thing is also occurring on the Playstation, the high-level API is getting an overhaul thanks to Vulkan, it still has it's low-level API for developers to target of course.
Ram is still a bit of an issue too, 5-5.5Gb of Ram for the game and graphics is pretty tiny, it may become a real limiter in the coming years, slightly offset with hard drive asset streaming.
To compare it to a PC the Xbox One is like a Core i3 3ghz, 4Gb of Ram, Radeon 7770, 1.5Gb graphics card.
Change the GPU to a Radeon 7850 for the PS4 and that's what we have for the next half decade or more.
Laststop311 - Saturday, March 28, 2015 - link
Correct me if I'm wrong but I believe the ps4 is built with a downclocked 7870 (20 cu) but the ps4 igpu has 2 CU disabled as well as the downclock. The 7850 is a 16 CU part but i guess the 2 extra CU combined with the downclock would make the ps4 behave like a 7850. the radeon 7770 is only 10CU and the xbone has 12CU's but a lower clock. So are you basically saying for the ps4 and xbone the extra 2cu + the lower clock speed makes them equal to those desktop cards? Because they really aren't exactly those cards. Some situations the higher clock speed matters more and some the more cu's matter more. In some situations the ps4 may behave more like a 7870 than a 7850 and the xbone may be more like a 7790 than a 7770 in some situations.Gigaplex - Monday, March 30, 2015 - link
The console CPUs are actually significantly slower than a Haswell i3. The Pentium chips are a closer comparison due to the lack of hyperthreadingmr_tawan - Monday, March 30, 2015 - link
'PC is not meant to be played' (TM)(Just kidding though)
If the developers done their jobs right, hi-specs PC still gains much advantage over console (especially in the frame rate area). However PC itself are also a drag as well (remember those Atom/Pentium equipped PC).
JonnyDough - Tuesday, March 31, 2015 - link
Half the time it's just that they don't even bother updating menus and controls. Skyrim is a prime example.Veritex - Friday, March 27, 2015 - link
All the next generation consoles are based on AMD eight core CPU and GCN architecture (with Nintendo possibly opting for an ARM CPU paired with GCN), so developers will just have to optimize once for the consoles and have a easier time porting to PCs.It is interesting to see the AMD R9-285 Tonga consistently outperform Nvidia's high end GTX 980 and its make you wonder how incredibly fast the next generation R9-390x Fiji and 380x could be.