Updating AnandTech’s 2013 Mobile Benchmark Suite (RFC)by Jarred Walton on January 29, 2013 9:45 PM EST
- Posted in
If it seems like just last year that we updated our mobile benchmark suite, that’s because it was. We’re going to be keeping some elements of the testing, but with the release of Windows 8 We’re looking to adjust other areas. This is also a request for input (RFC = Request for Comments if you didn’t know) from our readers on benchmarks they would like us to run—or not run—specifically with regards to laptops and notebooks.
We used most of the following tests with the Acer S7 review, but we’re still early enough in the game that we can change things up if needed. We can’t promise we’ll use every requested benchmark, in part because there’s only so much time you can spend benchmarking before you’re basically generating similar data points with different applications, and also because ease of benchmarking and repeatability are major factors, but if you have any specific recommendations or requests we’ll definitely look at them.
General Performance Benchmarks
We’re going to be keeping most of the same general performance benchmarks as last year. PCMark 7, despite some question as to how useful the results really are, is at least a general performance suite that’s easy to run. (As as side note, SYSmark 2012 basically requires a fresh OS install to run properly, plus wiping and reinstalling the OS after running, which makes it prohibitively time consuming for laptop testing where every unit comes with varying degrees of customization to the OS that may or may not allow SYSmark to run.) We’re dropping PCMark Vantage this year, mostly because it’s redundant; if Futuremark comes out with a new version of PCMark, we’ll likely add that as well.
At least for the near term, we’re also including results for TouchXPRT from Principled Technologies; this is a “light” benchmark suite designed more for tablets than laptops (at least in our opinion), but it does provide a few other results separate from a monolithic suite like PCMark 7. We’ll also include results from WebXPRT for the time being, though again it seems more tablet-centric. We don’t really have any other good general performance benchmarking suites, so for other general performance benchmarks we’ll return once again to the ubiquitous Cinebench 11.5 and x264 HD. We’re updating to x264 HD 5.x, however, which does change the encoding somewhat, and if a version of x264 comes out with updated encoding support (e.g. for CUDA, OpenCL, and/or Quick Sync) we’ll likely switch to that when appropriate. We’re still looking for a good OpenCL benchmark or two; WinZip sort of qualifies, but unfortunately we’ve found in testing that 7-zip tends to beat it on file size, compression time, or both depending on the settings and files we use.
On the graphics side of the equation, there doesn’t seem to be a need to benchmark every single laptop on our gaming suite—how many times do we need to see how an Ultrabook with the same CPU and iGPU runs (or doesn’t run) games?—so we’ll continue using 3DMark as a “rough estimate” of graphics performance. As with PCMark, we’re dropping the Vantage version, but we’ll continue to use 3DMark06 and 3DMark 11, and we’ll add the new version “when it’s done”. We’re considering the inclusion of another 3D benchmark, CatZilla (aka AllBenchmark 1.0 Beta19), at the “Cat” and “Tiger” settings, but we’d like to hear feedback on whether it makes sense or not.
Finally, we’ll continue to provide analysis of display quality, and this is something we really hope to see improve in 2013. Apple has thrown down the gauntlet with their pre-calibrated MacBook, iPhone, iPad, and iMac offerings; if anyone comes out with a laptop that charges Apple prices but can’t actually match Apple on areas like the display, touchpad, and overall quality, you can bet we’ll call them to the carpet. Either be better than Apple and charge the same, or match Apple and charge less, or charge a lot less and don’t try to compete with Apple (which is a dead-end race to the bottom, so let’s try to at least have a few laptops that eschew this path).
As detailed in the Acer S7 review, we’re now ramping up the “difficulty” of our battery life testing. The short story is that we feel anything less than our previous Internet surfing test is too light to truly represent how people use their laptops, so we’re making that our Light test. For the Medium test, we’ll be increasing the frequency of page loads on our Internet test (from every 60 seconds down to every 12 seconds) and adding in playback of MP3 files. The Heavy test is designed not as a “worst-case battery life” test but rather as a “reasonable but still strenuous” use case for battery power, and we use the same Internet test as in the Medium test but add in looped playback of a 12Mbps 1080p H.264 video with a constant FTP download from a local server running at ~8Mbps (FileZilla Server with two simultaneous downloads and a cap of 500KBps, downloading a list of large movie files).
Other aspects of our battery testing also warrant clarification. For one, we continue to disable certain “advanced” features like Intel’s Display Power Saving Technology (which can adjust contrast, brightness, color depth, and other items in order to reduce power use). The idea seems nice, but it basically sacrifices image quality for battery life, and since other graphics solutions are not using these “tricks” we’re leaving it enabled. We also disable refresh rate switching, for similar reasons—testing 40Hz on some laptops and 60Hz on others isn’t really apples-to-apples. Finally, we’re also moving from 100 nits brightness to 200 nits brightness for all the battery life testing, and the WiFi and audio will remain active (volume at 30% with headphones connected).
In truth, this is the one area where there is the most room for debate. Keep in mind that when testing notebooks, we’re not solely focused on GPU performance most of the time (even with gaming notebooks); the gaming tests are only a subset of all the benchmarks we run. We’ll try to overlap with our desktop GPU testing where possible, but we’ll continue to use 1366x768 ~Medium as our Value setting, 1600x900 ~High as our Mainstream setting, and 1920x1080 ~Max for our Enthusiast setting. Beyond the settings however is the question of which games to include.
Ideally, we’d like to have popular games that also tend to be strenuous on the graphics (and possibly the CPU as well). A game or benchmark that is extremely demanding of your graphics hardware that few people actually play isn’t relevant, and likewise a game that’s extremely popular but that doesn’t require much from your hardware (e.g. Minecraft) is only useful for testing low-end GPUs. We would also like to include representatives of all the major genres—first person shooter/action, role-playing, strategy, and simulation—with the end goal of having ten or fewer titles (and for laptops eight seems like a good number). Ease of benchmarking is also a factor; we can run FRAPS on any game, but ideally a game with a built-in benchmark is both easier to test and produces more reliable/repeatable results. Frankly, at this point we don’t have all that many titles that we’re really set on including, but here’s the short list.
Elder Scrolls: Skyrim: We’ve been using this title since it came out, and while it may not be the most demanding game out there, it is popular and it’s also more demanding (and scalable) than most other RPGs that come to mind. For example, Mass Effect 3 generally has lower quality (also DX9-only) graphics and doesn’t require as much from your hardware, and The Witcher 2 has three settings: High, Very High, and Extreme (not really, but it doesn’t scale well to lower performance hardware). Skyrim tends to hit both the CPU and GPU quite hard, and even with the high resolution texture pack it can still end up CPU limited on some mobile chips. Regardless of our concerns, however, we can’t think of a good RPG replacement, so our intention is to keep Skyrim for another year.
Far Cry 3: This is an AMD-promoted title, which basically means they committed some resources to helping with the games development and/or advertising. In theory, that means it should run better on AMD hardware, but as we’ve seen in the past that’s not always the case. This is a first-person shooter that has received good reviews and it’s a sequel to a popular franchise with a reputation for punishing GPUs, making it a good choice. It doesn’t have a built-in benchmark, so we’ll use FRAPS on this one.
Sleeping Dogs: This is another AMD-promoted title. This is a sandbox shooter/action game with a built-in benchmark, making it a good choice. Yes, right now that's two for AMD and none for NVIDIA, but that will likely change with the final list.
Sadly, that’s all we’re willing to commit to at this point, as all of the other games under consideration have concerns. MMORPGs tend to be a bit too variable, depending on server load and other aspects, so we’re leaving out games like Guild Wars 2, Rift, etc. For simulation/racing games, DiRT: Showdown feels like a step back from DiRT 3 and even DiRT 2; the graphics are more demanding, yes, but the game just isn’t that fun (IMO and according to most reviews). That means we’re still in search of a good racing game; Need for Speed Most Wanted is a possibility, but we’re open for other suggestions.
Other titles we’re considering but not committed to include Assassin’s Creed III, Hitman: Absolution, and DmC: Devil May Cry; if you have any strong feelings for or against the use of those titles, let us know. Crysis 3 will hopefully make the grade this time, as long as there's no funny business at launch or with the updates (e.g. no DX11 initially, and then when it was added the tessellation was so extreme that it heavily favored NVIDIA hardware, even though much of the tessellation was being done on flat surfaces). Finally, we’re also looking for a viable strategy game; Civ5 and Total War: Shogun 2 could make a return, or there are games like Orcs Must Die 2 and XCOM: Enemy Unknown, but we’re not sure if either meet the “popular and strenuous” criteria, so we may just hold off until StarCraft II: Heart of the Swarm comes out (and since that games on “Blizzard time”, it could be 2014 before it’s done, though tentatively it’s looking like March; hopefully it will be able to use more than 1.5 CPU cores this time).
As stated at the beginning, this is a request for comments and input as much as a list of our plans for the coming year. If you have any strong feelings one way or the other on these benchmarks, now is the time to be heard. We’d love to be able to accommodate every request, but obviously there are time constraints that must be met, so tests that are widely used and relevant are going to be more important than esoteric tests that only a few select people use. We also have multiple laptop reviewers (Dustin, Jarred, and occasionally Vivek and Anand), so the easier it is to come up with a repeatable benchmark scenario the better. Remember: these tests are for laptops and notebooks, so while it would be nice to do something like a compilation benchmark, those can often take many hours just to get the right files installed on a system, which is why we’ve shied away from such tests so far. But if you can convince us of the utility of a benchmark, we’ll be happy to give it a shot.
Post Your CommentPlease log in or sign up to comment.
View All Comments
JarredWalton - Wednesday, January 30, 2013 - linkAs noted in the article, I primarily include 3DMark because I can't go back and retest most of the old laptops -- they're sent back when we finish the reviews. Thus, it's good to have at least a few graphics tests that give us an estimate of graphics performance over a long period of time. Besides that, 3DMark is extremely easy to install and run, taking less than an hour to complete several runs of each version, and it can be automated (start, go make lunch or write something, come back and it's done).
As for thoroughly testing a few select games rather than less in depth testing of more games, we'll see what others think. In the past, the general consensus has been "more more more" to both games and testing. I think cutting down the list to a few titles just encourages companies to highly optimize for those titles and ignore all others, so I'm strongly opposed to such a course.
Alkapwn - Wednesday, January 30, 2013 - linkOne of the things I would like to see done is some sort of bechmark on how well the device handles full workloads before generating enough heat it throttles itself on the GPU/CPU side.
On the various notebooks and portables I've tested, I ran Prime95 and Furmark, or Intel's tool, or some combination of stress test tools to see how long it takes to go from room temperature idle to CPU/GPU throttle temperatures.
Great laptop designs handle heat well, were as others do not.
mrdude - Wednesday, January 30, 2013 - linkDitto. I'd definitely urge you guys to pay closer attention to throttling and TDP limitations as chip makers push the bounds of performance in smaller form factors. When comparing competing laptops and devices with similar hardware, the cooling is likely to be the biggest determining factor with respect to performance differentiation.
slashdotcomma - Wednesday, January 30, 2013 - linkI would be interested in seeing WiFi performance say 25', 50', and 100' or whatever distance is deemed valueable. I find that WiFi performance even with the same chipset varies based on mobile device's wireless antenna and placement. I know typical reviews show 2.4GHz speed and 5GHz, but it would be nice to know what to expect from the antenna strength too.
IanCutress - Wednesday, January 30, 2013 - linkHopefully Jarred has access to an open field and it never rains :) I would have to go around to the 20+ flats in the local vicinity and ask them all to turn off their microwaves and home WiFi.
JarredWalton - Wednesday, January 30, 2013 - linkBingo, Ian. Not only do I not have a great testing location (there are at present nine 2.4GHz networks in range of my house), but we have other editors in different locations with different testing equipment (routers) and environments. I think the best we can do is to do a short test or two at "ideal" range (less than five feet) and at a moderate distance of 25 feet from the router, with the understanding that results aren't really comparable between reviews.
Boogaloo - Wednesday, January 30, 2013 - linkI'd like to see dwarf fortress benchmarks on mobile devices. Most laptops have sub-par graphics performance but decent to good processors these days, and dwarf fortress is obviously almost entirely CPU (and apparently memory bandwidth) bound. All laptops should be capable of running it at least decently, but it really hammers your CPU once the fort population rises up above 100. It's a good game that can produce a meaningful benchmark on any laptop, not just the ones with good dedicated graphics.