Battlefield 4 Mantle Performance Preview
by Ryan Smith on February 1, 2014 12:40 PM ESTAfter a false start or two, AMD is finally getting the first beta of Mantle out the door. With EA DICE having shipped their Mantle patch for Battlefield 4 and developer Oxide having released their Star Swarm technical demo, the first Mantle-enabled applications have landed. Meanwhile AMD for their part is still hammering out an installation issue on their new Mantle-enabled Catalyst drivers, which has led to them missing their previously scheduled January release date.
In the interim, AMD has released a slightly finickier set of drivers to the press for us to play around with ahead of the public Mantle driver release. These drivers should be functionally and performance identical to the public drivers, they just have an outstanding installation bug that requires a workaround, something that AMD doesn’t want in the shipping version. AMD hasn’t provided a public release date for these drivers – at this point it’s in their best interest to avoid providing release dates they don’t know if they can keep – but given the fact that this is the sole showstopper issue in our press drivers, we certainly don’t expect they’ll take much longer.
In any case, we’re hard at work at the moment putting together our full evaluation of this first version of Mantle. That article won’t be ready until next week, but in the meantime given the immense interest in Mantle, we wanted to quickly publish our first batch of numbers for Battlefield 4. We will have a much wider selection of benchmarks for our full article, including many more video cards and results for Star Swarm, but we wanted to quickly bring you what’s almost certainly going to be the most interesting set of data: Mantle performance with a high-end video card.
For that we’re turning to AMD’s Radeon R9 290X, testing the performance of that card under both Direct3D and Mantle in EA’s Battlefield 4. Battlefield 4 is Mantle’s showcase title and accordingly the first real world use case for AMD’s new API, making it the best place to start. As an application retrofitted with Mantle support we don’t expect Battlefield 4 to tap the complete potential of Mantle right out of the door – certainly not when the Mantle SDK and driver stack itself is still in development – but it can give us an idea of what kind of performance gains we can expect if developers chase the low-hanging fruit offered by Mantle.
What is that low-hanging fruit? For the most part that is going to be CPU bottlenecks, specifically bottlenecking in issuing draw calls. Of all of the bottlenecks that can impact a high performance GPU, keeping it fed can be the biggest bottleneck, and in turn bottlenecking in the draw call submission phase can be the biggest culprit. In the long term Mantle will also benefit GPU performance more directly by optimizing workflows within a GPU, and we already see a small bit of that today in Battlefield 4, but the bulk of the optimizations for these earliest titles have been made around the draw call bottleneck.
For our Mantle preview we’re taking a look at two sections of the Battlefield 4 single player game, the first being from the Tashgar mission and the second being from the South China Sea mission. As was the case with Battlefield 3 the use of single player is less than ideal, but as Battlefield 4 lacks a formal benchmark or for that matter the ability to record multiplayer matches, we’re left with single player if we want to have reasonably repeatable benchmarks. And we’ll definitely want a high degree of repeatability if we’re to be able to distinguish Mantle gains from variability in GPU bound scenarios.
Meanwhile to cover a wider spectrum of possibilities, we’re running our 290X against 3 CPU configurations on our GPU testbed. The first of which is our standard configuration, which is our i7-4960X with all cores and HypterThreading enabled (6C/12T), running at 4.2GHz. Our second configuration drops that down to 4C/4T at 2GHz, to test for the benefits of Mantle on a still relatively large core count at lower clockspeeds. Our final configuration takes the core count down further, to 2C/4T at 3GHz, so that we can see what performance is like for processors with fewer cores but higher clockspeeds.
Finally, on a quick note, for measuring Battlefield 4's performance we're using the game's newly built in PerfOverlay.FrameFileLogEnable feature, which replaces FRAPS in this game due to the fact that FRAPS only works with Direct3D and OpenGL. FrameFileLogEnable logs frame times for later analysis, and from this we can reconstruct the minimum and average framerates, and even the full frame pacing performance of the game (but only from the perspective of the game, not the video card). Today we'll be looking at just the average framerates, but be sure to come back next week for our full evaluation, where we'll have frame pacing data and minimum framerates ready to go.
CPU: | Intel Core i7-4960X @ 4.2GHz |
Motherboard: | ASRock Fatal1ty X79 Professional |
Power Supply: | Corsair AX1200i |
Hard Disk: | Samsung SSD 840 EVO (750GB) |
Memory: | G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26) |
Case: | NZXT Phantom 630 Windowed Edition |
Monitor: | Sharp PN-K321 |
Video Cards: | AMD Radeon R9 290X (Uber) |
Video Drivers: | AMD Catalyst 14.1 Beta |
OS: | Windows 8.1 Pro |
SP-Tashgar
Our first test comes from the Tashgar mission, and is the benchmark we will be using for day-to-day GPU benchmarking. This benchmark takes place immediately at the start of the mission, with our character driving out of the mountains and into the city of Tashgar. This benchmark has a limited CPU load and is GPU-bound in most situations, which potentially limits the benefits of Mantle in alleviating CPU bottlenecks, but gives us an idea of what kind of performance benefits we can expect in GPU-bound scenarios.
Battlefield 4 Tashgar: Mantle Performance Gains | |||||
Ultra | High | Low | |||
i7-4960X 6C/12T @ 4.2GHz | 8% | 10% | -14% | ||
i7-4960X 4C/4T @ 2GHz | 8% | 13% | 26% | ||
i7-4960X 2C/4T @ 3GHz | 8% | 13% | 28% |
Even at 1080p Ultra, where the Radeon R9 290X is clearly GPU-bound, we can see that switching to Mantle offers some performance improvements. With our i7-4960X fully powered up, this leads to an 8% performance increase, and we see similar performance increases even with other CPU configurations. Since we don’t appear to be CPU-bound in any appreciable way, this gives us a decent idea of what kind of GPU performance benefits Mantle can offer.
Meanwhile if we switch to High and Low settings, the higher framerates are able to tease out the CPU benefits of Mantle. With BF4’s High settings this is 10-13% depending on the CPU configuration, which indicates we’re still significantly GPU bound here.
Using Low quality settings on the other hand significantly widens the gap in both directions, with the minimum gain being -14%, and the maximum gain being 28%. In the case of our 6C/12T CPU configuration, Mantle actually has a detrimental impact on performance, bringing down our framerate from a positively absurd 216fps to a slightly less absurd 181fps. This was unexpected to say the least, and while we’re not particularly concerned about it given the fact that we have little reason to use this setting in day-to-day gaming, but it does point to a weakness in the current builds of BF4 and the Mantle drivers.
Otherwise if we move to our slower CPU configurations, the benefits are 26% and 28% for 4C/4T and 2C/4T respectively. Despite the fact that the 4C/4T setup has more real cores to work with, which under normal circumstances would be the stronger setup for a highly threaded application, it’s the 2C/4T setup that technically benefits the most. The difference is quite small, but it’s an interesting outcome none the less.
SP-South China Sea
Our second test comes from the South China Sea mission of Battlefield 4, where our character and his squad are on the quickly disintegrating USS Titan. Whereas our first test is rather uniformly GPU-bound, the breakup of the USS Titan offers us the chance to look at a more CPU-bound scenario. Even this scene isn’t exclusively CPU-bound, but with ship parts and other debris flying around everywhere, it’s going to be one of the more strenuous CPU workloads in the single player game.
Battlefield 4 South China Sea: Mantle Performance Gains | |||||
Ultra | High | Low | |||
i7-4960X 6C/12T @ 4.2GHz | 7% | 8% | 7% | ||
i7-4960X 4C/4T @ 2GHz | 10% | 26% | 17% | ||
i7-4960X 2C/4T @ 3GHz | 10% | 30% | 28% |
Starting once again at 1080p Ultra, even with the greater CPU workload presented by this test, we are unsurprisingly still GPU-bound on Ultra settings. The benefits aren’t as uniform as last time – they now range from 7% to 10% – but it’s safe to say that we’re once again seeing what are mostly the GPU performance benefits of Mantle.
However shifting to High quality shows much greater performance gains, indicating that we’re at least partially (if not fully) CPU-bound here. Once we reduce our CPU performance from 6C/12T to 4C/4T, the performance gains from using Mantle jump from 8% to 26%, and then to 30% when using our 2C/4T configuration. For a game that’s not immensely CPU bound in the first place and has been retrofitted for Mantle, this is towards the upper bound of what we would expect.
Finally switching over to our Low quality settings causes our performance gains to actually taper off some. We’re still CPU-bound on our 4C/4T setup leading to a 17% performance gain, but we’re not as CPU-bound as we were at High quality settings, apparently. Meanwhile the performance gains for 2C/4T remain similar to last time, at 28%. Battlefield 4 has multiple CPU tasks going on here, not the least of which is the simulation itself, so in the case of our 4C/4T setups it’s likely we’ve stumbled onto a situation where the game is more strongly CPU-bound by the simulation and other aspects of the game than it is the submission of draw calls.
First Thoughts
As this is only a brief preview of our results we don’t intend to read too much into this limited data set, but even just looking at the 290X does provide us with some interesting data. For the pure high-end scenario – a 290X or similar GPU with a high-end CPU – Mantle can still offer performance benefits from the GPU workflow optimizations it provides. A 7-10% performance increase is not a dramatic difference, but it is 7-10% better performance than AMD had yesterday.
Meanwhile it comes as little surprise that the greatest performance benefits in our limited BF4 testing come in the mixed performance scenarios, pairing up a high-end GPU with slower CPUs. Since the lowest hanging fruit for Mantle optimizations is going to be CPU draw call bottlenecks, it’s going to be the weaker CPUs that have the most to gain here. In this case we still need to go out of our way to create CPU-bound scenarios – the 290X is rarely held back by the CPU on Ultra quality settings – but when we do create them we can see some of potential that Mantle can offer. At High and Low quality settings, and excluding our one Mantle performance regression, we see performance gains anywhere between 7% and 30%. This shows (if nothing else) that even a retrofit game with a highly optimized Direct3D rendering path, like Battlefield, can still be bottlenecked by draw call performance. And that consequently some of Mantle’s CPU overhead reduction capabilities do in fact pan out.
As for whether all of this is worth the costs and tradeoffs of Mantle from both a consumer perspective and a developer perspective is a longer discussion that we’ll be having next week, alongside our expanded benchmark results. But at first glance it looks like AMD has cleared the first hurdle, which is showcasing that there are tangible benefits to having a low-level graphics API. Now AMD just needs to further hammer out their Mantle drivers and get them into a public-consumable state, so that the wider community of end-users can test and evaluate AMD’s Mantle offering. Outside of the known installation issue we have not encountered any issues with Mantle thus far – this being despite the fact that AMD is being very explicit about the beta nature of the Mantle stack – so hopefully this is a good omen for the company after the delays leading up to this point.
AMD's Official Performance Data
Finally, we’ll quickly close with some of AMD’s performance numbers, which they’ve published in their reviewer’s guide. We feel that vendor-provided should always be taken with a grain of salt, but they do serve their purpose, especially for getting an idea of what performance is like under a best case scenario. To that end we can quickly see that AMD was able to top out at a 41% performance improvement on a 290X paired with an A10-7700K. This is a greater performance gain than the peak gain of 30% we’ve seen in our own results, but not immensely so. More importantly it can give us a good idea of what to reasonably expect for performance under Battlefield 4. If AMD’s results are accurate, then a 40% performance improvement is the most we should be expecting out of Battlefield 4’s Mantle renderer.
135 Comments
View All Comments
MugatoPdub - Saturday, February 1, 2014 - link
Thanks for the prelim, I have to add however; I don't see how a 4960x and a 290x are the most "wanted" items to benchmark. I can wait, but I would believe most people would like to see benchs on mainstream hardware, i.e., R9 270x and an FX-6300 w/ 8GB, SSD. Or R9 280x and i5-4670k w/ 16GB. These are the most common setups.junky77 - Saturday, February 1, 2014 - link
I think you should try and test BF4 on a multiplier scenes with a lot of players in one place. It might be more interesting for many and it stress the CPU more in some casesAlso, please test BF4 with some midrange AMD CPU
Ian Cutress - Saturday, February 1, 2014 - link
How do you propose we make that into a repeatable and honest benchmark, then do it over a dozen (or dozens) of setups and settings?blanarahul - Saturday, February 1, 2014 - link
Offline Multiplayer?? You know, like Counter Strike.junky77 - Saturday, February 1, 2014 - link
you obviously can't repeat it identically (unless there is some replay option) and it will take maybe too much time. but :1. You can test the same map with same number of players, as I'm sure you would
2. Run your character through the same path in the map, running into a lot of people. You should be dead quickly enough and be respawned, so maybe it will require reasonable time to test
3. Even getting a consistent minimal FPS could be nice
4. I know it is hard and not accurate, but if the results are consistent enough (like getting the minimal FPS) in some maps, at least it can give players some idea about BF4 multiplayer performance
I also guess that if the difference between Mantle and non-Mantle gaming is big enough in case of multiplayer, we'll probably see it
Personally, I've tried to benchmark the multiplayer and though it was hard due to a lot of bugs, the FPSs were quite consistent for the same map
junky77 - Saturday, February 1, 2014 - link
also, I didn't mean to offend if it sounded like thatchizow - Saturday, February 1, 2014 - link
Run a closed server.Benchmark baseline control with 0 players.
Benchmark server populated with 10-20 editors or volunteers, just parked in vehicles and fixed locations. Even have them fire on fixed, non-destructible locations. If the results are different enough to show a solid CPU load and difference in performance from the control run, just go with that.
The server should still have to account for the players even thought they aren't really doing anything, but once you have them acting and destroying things that starts introducing variability.
The argument to validate these results is that even in real multiplayer games, your framerate should still remain relatively consistent when the round opens and you are just running to the first objective vs in the heat of combat, ie. you don't necessarily get more CPU load just by having more actions onscreen.
yannigr - Saturday, February 1, 2014 - link
These results look interestinghttp://pclab.pl/art55953-3.html
Now cutting cores on a high end Intel processor is not the best way to come to conclusions. An Intel core is much different to an AMD core and maybe faster than a core in a Pentium processor (more cache available etc.). In the link above Mantle looks really really interesting, much more than what you found here with your preview.
mmrezaie - Saturday, February 1, 2014 - link
That was a great article +yannigrblanarahul - Saturday, February 1, 2014 - link
I threw up after seeing the Multiplayer results. 90% performance gain in MP?? I seriously doubt that.