Benchmarking Principles
The wrong answer to the question is
to provide an accurate and precise measure of performance. That's not the point. That's what a good benchmark does, but that's not its purpose. In medicine, a good lab test needs to be accurate and precise, but you only order a lab test when the results will change what you will do with a patient. The same concept applies here. The purpose of a CPU benchmark is to provide information that helps
you decide which of two CPUs to buy, or which of two tweaks to perform. The information you want to know is which CPU gives you the best performance for the games and applications you run.
Thus, the question a benchmark should answer seems clear, right?
What is going to give me the fastest performance for the applications I run?
So, when it comes to games, it’s pretty easy to pick out which games are important. A good number of people play Half-Life 2 and the associated Counter Strike: Source, and a lot of people were into Doom 3… but the rest of us play more than those two games, and in fact, I’m not sure that many people are still playing Doom 3. To really get the best sense of how fast a product is, you'd really want to have performance data on every game out there. Readers would then check off the boxes of the games they played and see a computer generated result listing the optimal hardware configuration (patent pending).
Unfortunately, it's not practical for reviewers to test every single game on the market and this is where synthetic benchmarks should come in. The point of a synthetic benchmark isn't to evaluate the “theoretical” hardware performance but to serve as a marker of performance in those dozens of games you also play but cannot test. This is why Doom 3 is still benchmarked even though it’s not an actively popular game – we know that future games will be released on the Doom 3 graphics engine. Results in Doom 3 will help predict the performance results of some of tomorrow’s games.
Taking this into account, a mistake of academic benchmarkers is to believe that the perfect benchmark is one that is open source to ensure full transparency and to produce a true vendor-neutral test. While this certainly helps people gain insight into the underlying engineering of a GPU or CPU – it’s not at all important for most end-users for a simple reason:
Games aren't vendor-neutral. Applications aren’t vendor-neutral.