Our real-world testing
Now that I’ve discussed the differences between benchmarking with demos and “real-world” usage with manual walkthroughs, I’ve got something to admit, we combined both techniques in our Core 2 article, and have been using both techniques for quite some time now when testing hardware. Any website which tests with Bethesda’s Oblivion for example is doing the same too, as the game doesn’t have a built-in method for recording demos. Guess a lot of sites are more real-world than [H]ardOCP thinks huh?
So how do we conduct our “real-world” testing with manual walkthroughs?
First, the key is you have to make your walkthrough sequence as repeatable as possible. This is to ensure that you’re giving the same workload to the hardware you’re testing. If we were to go around shooting at objects and interacting with the environment, that would affect performance differently, and introduce more variability in your benchmark results.
If you don’t try and minimize this variability basically you’d end up doing one thing with one piece of hardware and a totally different thing with the 2nd hardware component. That's not a very scientific way of testing. And by the way, with timedemos, you're doing the load evenly for both.
That's why when we do manual runs, we have to minimize our interaction with the environment. In other words, we don't shoot at things and we don't allow our character to be shot at. We've also got to walk a tightrope basically, running down the same set path as close as we can every time. Even then though there's still going to be variability, in the Oblivion City area for our Core 2 review for instance I've observed that the NPCs may take slightly different routes – one or two may not even show up at all going from one run to the next.
The long and short of it is, there's no way any one method can 100% replicate what the end user is going to experience, because everyone plays games differently. So what you've got to do is make things as even as possible so that the hardware is tested properly. You don't want to give one piece of hardware more of a load than the 2nd piece of hardware. The load needs to be the same.
As a result, testing with manual walkthroughs can’t be quite as stressful as testing with timedemos because the walkthroughs don’t involve intense combat, so instead we try our best to offset this by finding areas that will put as much stress on the component tested as possible. An example of this would be our foliage tests in GPU reviews, and the city testing in CPU reviews.