Defining processor performance
If there's only one lesson that has been learned as a result of the AMD versus Intel CPU wars, clearly it's that clock speed alone doesn't reveal the entire picture when it comes to processor performance. AMD's Athlon 1.2GHz demonstrated this profoundly last November when Pentium 4 (P4) officially launched to the public. Despite giving up 300MHz to the P4, the 1.2GHz Athlon came out ahead in the majority of our tests. How can this be you ask?
One theme you'll frequently hear from engineers at both Intel and AMD is that processor performance is equal to the number instructions executed per clock cycle (IPC) multiplied by the processor's clock frequency:
Performance = IPC x Clock speed
Intel's way: Clock speed first!
From there, both companies go on a divergent path. With its 20-stage pipeline Intel's Pentium 4 microarchitecture sacrifices the amount of work performed per clock for more stages. The upside of the P4s twenty stage pipeline however is that it allows the processor to scale to higher clock frequencies. While Pentium III and its 10-stage pipeline topped out at 1GHz when built on Intel's 0.18-micron manufacturing process, Pentium 4 is slated to soon hit 2GHz. Quite simply, with Pentium 4 Intel has designed the P4 to run at high clock speeds to make up for the lower amount of work performed per clock cycle, the "IPC" from the previous paragraph. Sheer clock frequency will essentially make up for the lower IPC from the processor performance equation.
Of course, by focusing on improving clock frequency, Intel gains the added "ghee wiz" or "wow" factor in the minds of the mainstream consumer, as clock frequency has traditionally been used to gauge a processors' performance in much the same way as engine horsepower has been used to gauge the performance of an automobile.
Keep in mind that this is an oversimplified example; we still haven't taken into account the performance penalty of mispredicted branch instructions. With its twenty stage pipeline, the performance penalty of a mispredicted branch is much more severe on Pentium 4 in comparison to Pentium III.
AMD: A blend of clock speed and IPC
In contrast to Pentium 4's design, with Athlon, AMD has focused on balancing IPC and clock frequency, mixed in with a few manufacturing enhancements available with its HiP6L 0.18-micron manufacturing process with copper interconnects at its fabrication facility in Dresden, Germany. In particular, Athlon utilizes a 10-stage pipeline for integer instructions (15 for floating point), resulting in more instructions executed per stage. When balanced with Athlon's clock speeds (up to 1.4GHz currently), AMD's Athlon is largely capable of keeping up with the fastest Pentium 4 processors, and in many cases, outperforming them.
With AMD's next revision of the Athlon core, codenamed "Palomino", AMD has aimed to increase IPC, resulting in more performance at a given clock speed. In addition, the new core has been redesigned to reduce power consumption, allowing the core to scale to higher clock speeds than previous cores at 0.18-micron. In essence, with Palomino AMD has not only increased IPC over previous Athlon processors, but they've also increased their frequency range - the thermal changes in Palomino will allow the chip to scale to clock speeds greater than 2GHz.
Let's take a quick look at the changes AMD has implemented in the new Palomino core.