Introduction
Normally I don’t go off on other websites in my rant articles. Believe it or not, reviewing hardware and games can be a difficult, time-consuming process and I sympathize with my peers as it can be a tough job. But
in their latest CPU review, hardware website [H]ardOCP called into question the testing methods of all other websites based in part on false information, and part subjective opinion. From the [H]ardOCP teaser:
We test Intel's Core 2 Duo and Extreme using real-world gaming. Don't let a bunch of canned benchmarks lie to you about gaming performance, real gameplay experience tells a different story. Unless of course you game at 800x600.
In his article he then goes on to state:
Let's just cut to the chase. You will see a lot of gaming benchmarks today that just simply lie to you. That is right, you will see frames per second numbers that are at best total BS, and at their worst a terrible representation of what difference a new Intel Core 2 processor will make in your gaming experience.
One of their assertions for why they feel their testing methods are better than the methods used by other sites is that timedemo benchmarks are “canned” runs, and do not include physics or AI calculations. However, this assumption
is simply not true. The truthful answer is that this will vary from game-to-game. Timedemos
can and do involve physics and AI calculations.
Quake 4 has two options for benchmarking, the “timedemo” console command and "playnettimedemo" command. “playnettimedemo” involves demos recorded over the ‘net (or network) and benchmarking with this command will include everything, including physics. This isn't a new development either, as Unreal Tournament 2004 has done this in botmatches for years.
And for the record, we did use the "playnetttimedemo" for our CPU benches in our Core 2 article.
What’s wrong with removing physics and AI anyway?
When testing a specific hardware component, whether it’s a graphics card, CPU, or any other component, as a reviewer you want to try to isolate the performance of that component as much as possible. That’s why reviewers will often test without sound, or run testing without other apps running in the background like email or ICQ, etc even though that’s not necessarily the way you normally do things in everyday use. After all, these variables can affect the true performance of the component you’re testing.
Timedemos that don’t include physics or AI is another tool that the reviewer can use to isolate the performance of a specific component, that’s why they’re often used to test video cards. When you're testing for video performance, removing CPU bottlenecks is the best way to see the true speeds of the graphics adapters.
Testing with this method isn’t “lying” to readers – you’re showing the true performance of the hardware component you’re reviewing. Even if you do want to make the argument that since some timedemos remove AI and physics you’re not simulating real-world use, that still doesn’t change the fact that as long as you test hardware the same way it won’t matter.
In other words, ATI’s card isn’t going to suddenly come out ahead of NVIDIA because the timedemo didn’t include physics or AI. As long as you’re testing the hardware properly, the outcome shouldn’t be affected, just the frame rate will be a little different. And besides, by taking out aspects like AI and physics, it removes one potential bottleneck that prevents you from seeing the true performance potential of the hardware. This is why timedemos that don’t use AI/physics can be good for testing graphics cards or testing certain aspects of graphics cards.
When testing with timedemos, what we do is we attempt to minimize and isolate the effect other system components can have on the item being reviewed and show how that component performs in comparison to other potential upgrades the reader may be faced with, as well as how the component performs in comparison to older products. The graphs are presented in a concise, easy-to-read format at multiple resolutions as we realize that the results can vary at different resolutions, not to mention that not everyone is stuck at one resolution.