Updating AnandTech’s 2013 Mobile Benchmark Suite (RFC)by Jarred Walton on January 29, 2013 9:45 PM EST
- Posted in
If it seems like just last year that we updated our mobile benchmark suite, that’s because it was. We’re going to be keeping some elements of the testing, but with the release of Windows 8 We’re looking to adjust other areas. This is also a request for input (RFC = Request for Comments if you didn’t know) from our readers on benchmarks they would like us to run—or not run—specifically with regards to laptops and notebooks.
We used most of the following tests with the Acer S7 review, but we’re still early enough in the game that we can change things up if needed. We can’t promise we’ll use every requested benchmark, in part because there’s only so much time you can spend benchmarking before you’re basically generating similar data points with different applications, and also because ease of benchmarking and repeatability are major factors, but if you have any specific recommendations or requests we’ll definitely look at them.
General Performance Benchmarks
We’re going to be keeping most of the same general performance benchmarks as last year. PCMark 7, despite some question as to how useful the results really are, is at least a general performance suite that’s easy to run. (As as side note, SYSmark 2012 basically requires a fresh OS install to run properly, plus wiping and reinstalling the OS after running, which makes it prohibitively time consuming for laptop testing where every unit comes with varying degrees of customization to the OS that may or may not allow SYSmark to run.) We’re dropping PCMark Vantage this year, mostly because it’s redundant; if Futuremark comes out with a new version of PCMark, we’ll likely add that as well.
At least for the near term, we’re also including results for TouchXPRT from Principled Technologies; this is a “light” benchmark suite designed more for tablets than laptops (at least in our opinion), but it does provide a few other results separate from a monolithic suite like PCMark 7. We’ll also include results from WebXPRT for the time being, though again it seems more tablet-centric. We don’t really have any other good general performance benchmarking suites, so for other general performance benchmarks we’ll return once again to the ubiquitous Cinebench 11.5 and x264 HD. We’re updating to x264 HD 5.x, however, which does change the encoding somewhat, and if a version of x264 comes out with updated encoding support (e.g. for CUDA, OpenCL, and/or Quick Sync) we’ll likely switch to that when appropriate. We’re still looking for a good OpenCL benchmark or two; WinZip sort of qualifies, but unfortunately we’ve found in testing that 7-zip tends to beat it on file size, compression time, or both depending on the settings and files we use.
On the graphics side of the equation, there doesn’t seem to be a need to benchmark every single laptop on our gaming suite—how many times do we need to see how an Ultrabook with the same CPU and iGPU runs (or doesn’t run) games?—so we’ll continue using 3DMark as a “rough estimate” of graphics performance. As with PCMark, we’re dropping the Vantage version, but we’ll continue to use 3DMark06 and 3DMark 11, and we’ll add the new version “when it’s done”. We’re considering the inclusion of another 3D benchmark, CatZilla (aka AllBenchmark 1.0 Beta19), at the “Cat” and “Tiger” settings, but we’d like to hear feedback on whether it makes sense or not.
Finally, we’ll continue to provide analysis of display quality, and this is something we really hope to see improve in 2013. Apple has thrown down the gauntlet with their pre-calibrated MacBook, iPhone, iPad, and iMac offerings; if anyone comes out with a laptop that charges Apple prices but can’t actually match Apple on areas like the display, touchpad, and overall quality, you can bet we’ll call them to the carpet. Either be better than Apple and charge the same, or match Apple and charge less, or charge a lot less and don’t try to compete with Apple (which is a dead-end race to the bottom, so let’s try to at least have a few laptops that eschew this path).
As detailed in the Acer S7 review, we’re now ramping up the “difficulty” of our battery life testing. The short story is that we feel anything less than our previous Internet surfing test is too light to truly represent how people use their laptops, so we’re making that our Light test. For the Medium test, we’ll be increasing the frequency of page loads on our Internet test (from every 60 seconds down to every 12 seconds) and adding in playback of MP3 files. The Heavy test is designed not as a “worst-case battery life” test but rather as a “reasonable but still strenuous” use case for battery power, and we use the same Internet test as in the Medium test but add in looped playback of a 12Mbps 1080p H.264 video with a constant FTP download from a local server running at ~8Mbps (FileZilla Server with two simultaneous downloads and a cap of 500KBps, downloading a list of large movie files).
Other aspects of our battery testing also warrant clarification. For one, we continue to disable certain “advanced” features like Intel’s Display Power Saving Technology (which can adjust contrast, brightness, color depth, and other items in order to reduce power use). The idea seems nice, but it basically sacrifices image quality for battery life, and since other graphics solutions are not using these “tricks” we’re leaving it enabled. We also disable refresh rate switching, for similar reasons—testing 40Hz on some laptops and 60Hz on others isn’t really apples-to-apples. Finally, we’re also moving from 100 nits brightness to 200 nits brightness for all the battery life testing, and the WiFi and audio will remain active (volume at 30% with headphones connected).
In truth, this is the one area where there is the most room for debate. Keep in mind that when testing notebooks, we’re not solely focused on GPU performance most of the time (even with gaming notebooks); the gaming tests are only a subset of all the benchmarks we run. We’ll try to overlap with our desktop GPU testing where possible, but we’ll continue to use 1366x768 ~Medium as our Value setting, 1600x900 ~High as our Mainstream setting, and 1920x1080 ~Max for our Enthusiast setting. Beyond the settings however is the question of which games to include.
Ideally, we’d like to have popular games that also tend to be strenuous on the graphics (and possibly the CPU as well). A game or benchmark that is extremely demanding of your graphics hardware that few people actually play isn’t relevant, and likewise a game that’s extremely popular but that doesn’t require much from your hardware (e.g. Minecraft) is only useful for testing low-end GPUs. We would also like to include representatives of all the major genres—first person shooter/action, role-playing, strategy, and simulation—with the end goal of having ten or fewer titles (and for laptops eight seems like a good number). Ease of benchmarking is also a factor; we can run FRAPS on any game, but ideally a game with a built-in benchmark is both easier to test and produces more reliable/repeatable results. Frankly, at this point we don’t have all that many titles that we’re really set on including, but here’s the short list.
Elder Scrolls: Skyrim: We’ve been using this title since it came out, and while it may not be the most demanding game out there, it is popular and it’s also more demanding (and scalable) than most other RPGs that come to mind. For example, Mass Effect 3 generally has lower quality (also DX9-only) graphics and doesn’t require as much from your hardware, and The Witcher 2 has three settings: High, Very High, and Extreme (not really, but it doesn’t scale well to lower performance hardware). Skyrim tends to hit both the CPU and GPU quite hard, and even with the high resolution texture pack it can still end up CPU limited on some mobile chips. Regardless of our concerns, however, we can’t think of a good RPG replacement, so our intention is to keep Skyrim for another year.
Far Cry 3: This is an AMD-promoted title, which basically means they committed some resources to helping with the games development and/or advertising. In theory, that means it should run better on AMD hardware, but as we’ve seen in the past that’s not always the case. This is a first-person shooter that has received good reviews and it’s a sequel to a popular franchise with a reputation for punishing GPUs, making it a good choice. It doesn’t have a built-in benchmark, so we’ll use FRAPS on this one.
Sleeping Dogs: This is another AMD-promoted title. This is a sandbox shooter/action game with a built-in benchmark, making it a good choice. Yes, right now that's two for AMD and none for NVIDIA, but that will likely change with the final list.
Sadly, that’s all we’re willing to commit to at this point, as all of the other games under consideration have concerns. MMORPGs tend to be a bit too variable, depending on server load and other aspects, so we’re leaving out games like Guild Wars 2, Rift, etc. For simulation/racing games, DiRT: Showdown feels like a step back from DiRT 3 and even DiRT 2; the graphics are more demanding, yes, but the game just isn’t that fun (IMO and according to most reviews). That means we’re still in search of a good racing game; Need for Speed Most Wanted is a possibility, but we’re open for other suggestions.
Other titles we’re considering but not committed to include Assassin’s Creed III, Hitman: Absolution, and DmC: Devil May Cry; if you have any strong feelings for or against the use of those titles, let us know. Crysis 3 will hopefully make the grade this time, as long as there's no funny business at launch or with the updates (e.g. no DX11 initially, and then when it was added the tessellation was so extreme that it heavily favored NVIDIA hardware, even though much of the tessellation was being done on flat surfaces). Finally, we’re also looking for a viable strategy game; Civ5 and Total War: Shogun 2 could make a return, or there are games like Orcs Must Die 2 and XCOM: Enemy Unknown, but we’re not sure if either meet the “popular and strenuous” criteria, so we may just hold off until StarCraft II: Heart of the Swarm comes out (and since that games on “Blizzard time”, it could be 2014 before it’s done, though tentatively it’s looking like March; hopefully it will be able to use more than 1.5 CPU cores this time).
As stated at the beginning, this is a request for comments and input as much as a list of our plans for the coming year. If you have any strong feelings one way or the other on these benchmarks, now is the time to be heard. We’d love to be able to accommodate every request, but obviously there are time constraints that must be met, so tests that are widely used and relevant are going to be more important than esoteric tests that only a few select people use. We also have multiple laptop reviewers (Dustin, Jarred, and occasionally Vivek and Anand), so the easier it is to come up with a repeatable benchmark scenario the better. Remember: these tests are for laptops and notebooks, so while it would be nice to do something like a compilation benchmark, those can often take many hours just to get the right files installed on a system, which is why we’ve shied away from such tests so far. But if you can convince us of the utility of a benchmark, we’ll be happy to give it a shot.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Nexing - Wednesday, January 30, 2013 - linkSo far, the Professional Audio world has been completely sided in this regards. Latency-wise computers and more so notebooks have presented problems not yet properly addressed by manufacturers.
Dawbench's test is widely utilized -particularly- by live acts and nowadays by Djs going pro, but nowhere to be seen at the computer side, hence missed at manufacturers radar scope. I do hope and expect Anandtech helps to bridge this too longstanding gap.
JarredWalton - Wednesday, January 30, 2013 - linkGiven the need for additional software besides just the DAW Bench files, this is probably too much to coordinate -- we'd need professional software (that most likely none of us have used). Is something like DPC Latency Checker sufficient, or is that too simplistic?
Nexing - Wednesday, January 30, 2013 - linkDPC is the ONLY way to start configuring a laptop that has been already bought and implemented...
And it only shows latency peaks over a time line. Most available solutions are in the area of disabling running services (antivirus, bluetooth, Wireless, and the rest not connected with the actual performance) or sequential tweaking of preferences and options of the involved software. Lastly continuing by disabling Plug-ins or reducing the load of musical layers because the actual notebook cannot actually handle what is been put through... and it usually shows it wtih dropouts, clicks or BSODs.
In these times that laptops achieve C9 latencies at 1600 Mhz with 16GB or more RAM, where we may access to a new lower floor of ns latencies by way of responsive SSDs, regular computer users that visit these technical websites cannot image that OVERALL latencies reach way over 5ms from pressing a MIDI controller button to a Firewire or USB out (not even counting the extra Lat introduced by the required external soundcard)... and easily exceed that figure, commonly several times. numbers that affect performing musicians
As it is so far, as computer buying segment, we mostly opt for MACs to reduce those critical risks at the midst of a musical performance. Or if we are Windows users, we buy notebooks with expresscard connectors to have the choice to solve the last stage Lat problem commonly introduced by certain USB/Firewire chipsets.
//Basically the problem is that there is nowhere to find a public bench or a standard that shows those latency numbers for commercially available notebooks. On the contrary, latencies for high end soundcards are standard in their specifications, but one cannot perform without the other and we have no way to know before buying what performance we will get.
I understand that this is a complex task, as i am sure specialized Studio or Pro Audio communities will gladly help to approach the needed standards. So far there has been no communication to fill the computer-Prof Audio gap, just unilateral efforts like Dawbench,... It would be of the interest of quality manufacturers, users and reviewers.
Agustin - Thursday, January 31, 2013 - linkHey Jared, if you are planning to do the test bench with W8, you will have a problem, if you check the DPC latency you will find that in W8 the latency is ALWAYS at 1000 us, which is highest value that you can achieved if you want a pleasant experience with your sound system, this is something that Microsoft implement on purpose in the kernel of windows 8 to reduce the power draw from the CPU especially in tablets. its not represent a problem for the common user, and the only way to reduce the DPC value is to stress the CPU to the max and the values will drop drastically to the order of 2 or 10 us
Rightmark audio analyzer is another program you can use to check the dB the integrated chip can deliver
And Sorry for my english Jared, Im from Argentina, Santiago del Estero
ToTTenTranz - Wednesday, January 30, 2013 - linkI think Skyrim should be tested with some of the quality enhancement mods.
In my opinion, it doesn't even make much sense to play Skyrim in a PC without using mods, since it will provide a generational leap in image quality.
Just go to the Steam Workshop, choose the top 5 IQ mods and list them in the benchmark description. It should be easy to reproduce for any reader.
I think XCOM and Orcs Must Die 2 should only be present in benchmarks for iGPUs and low-end discrete GPUs (if there will ever be such a thing in the future).
I don't think it would be very interesting to see 300 FPS in XCOM when testing the higher-end Geforce GTX 78x and Radeon HD89xx.
JarredWalton - Wednesday, January 30, 2013 - linkWe've commented on this in the past (and above I made a short comment in regards to Minecraft mods), but the short summary is: no way. Simply making a game more demanding isn't the goal; the goal is to test popular games the way most people will play them. On laptops, we're already running a couple tiers down from desktop GPUs in terms of performance, so making a game even more demanding (and often overflowing the limited VRAM with some mods) just doesn't make sense.
As for Orcs Must Die 2 and XCOM, your opinion is duly noted. It looks like they won't make the cut, which means we're still looking for a good strategy game or two to add to the list.
bostontx - Wednesday, January 30, 2013 - linkI love the Civ5 benchmarks, since it is one game that seems to stress both CPU & GPU. Also, it's my go-to game when I'm bored at the airport or at my hotel on the road. As laptop graphics get better, the play ability should increase in the later stages of the game.
powerarmour - Wednesday, January 30, 2013 - linkCatzilla please :)
IanCutress - Wednesday, January 30, 2013 - linkI find the Catzilla final score is too heavily influenced by the 'load time', which is for whatever reason part of their score calculation. This means that an odd read due to cache or other factors can impact the score a fair bit. I'm tempted to use it in my motherboard reviews when multi-GPU is sorted and I upgrade the drivers, although I'd be using the CPU+GPU score result, not the 'final score'.
dragosmp - Wednesday, January 30, 2013 - linkIn the last year frame time testing has developed into a pretty useful tool to asses the overall performance of a system. Primarily it has been used for graphics cards testing, but it can be used just as well for CPUs. It should be an excellent test for laptops since it would asses how smooth is the frame rate delivery of the overall system including the CPU, GPU and driver(s).
My opinion, and hopefully it's shared, is that within a given time it's better to test a few representative games thoroughly for FPS and frame time than more games only for FPS.
My choices: no 3DMark, only games: Farcry 3, Dirt 3, Civ 5 (late game sym) and Skyrim.