Phenom II enhancements
Built largely off the Phenom architecture, Phenom II has been tweaked to improve on some of Phenom’s weaknesses. Chief among them was frequency scaling. As you all know by now, AMD’s original plans for Phenom at launch a little over a year ago had to be completely thrown out the door, with the company launching at speeds topping out at just 2.3GHz. That’s 300MHz shy of the 2.6GHz that was expected.
Over time AMD’s engineers managed to massage more speed out of the architecture, ultimately culminating with the 2.6GHz Phenom 9950, and eked yet more speed out of the core thanks to ACC, which was integrated into AMD’s SB750 South Bridge and 790GX chipset (newer 790FX motherboards were also updated to include SB750 and ACC), but Phenom never really lived up to its full potential.
AMD took the lessons learned through developing ACC with 65-nm Phenom and baked them into their 45-nm Phenom II silicon. As a result, ACC no longer provides the OC’ing benefits it did previously with 65-nm Phenom parts. In the words of AMD: “you can just as well leave ACC off for Phenom II OC testing. Since, the "go-fast" things we learned from ACC (and those CPU parameter adjustments) were factored into 45nm, the benefit unlocked previously by ACC in 65nm silicon
is already being realized without having to use the ACC feature separately.” Over the Christmas holiday (before hearing this from AMD) we’d already attempted OC’ing our Phenom II CPU and sure enough, ACC provided no benefit when OC’ing the processor. We actually thought something was wrong with our 790GX motherboard until we heard back from AMD.
But that wasn’t the only tweak made with 45-nm. AMD’s new 45-nm manufacturing process continues to utilize strained silicon and silicon-on-insulator (SOI), but new to the manufacturing process is immersion lithography. With immersion lithography, liquid is used between the projection lens and the wafer’s die. According to AMD, this improves focus and provides a 40% gain in resolution versus conventional lithography. AMD believes immersion lithography is more efficient than Intel’s approach, and also cites that Intel won’t incorporate it until they switch to 32-nm.
The smaller process also improves energy efficiency. To further reduce power consumption though AMD has incorporated additional power states, including a new 800MHz P0 state as well as cache flush on halt: with 65-nm Phenom, an idle processing core would have to continue operating (albeit at a lower clock speed) in order to keep the data in its L1 and L2 caches available to the other cores. In contrast when an idle core in AMD’s 45-nm Phenom II enters halt state, it flushes the contents of its L1 and L2 caches into L3, which is a shared pool of memory that is accessible to the other cores. The idle core then essentially shuts down to save power.
Between the new manufacturing process, new power states, and other enhancements in Cool'n'Quiet 3.0, AMD estimates power savings of 40% at idle. Now obviously you won’t see all of that at a system level when testing at the wall, but this is a nice reduction that should also allow the new processors to generate less heat than 65-nm Phenom.
IPC enhancements
One of Phenom’s key weaknesses when compared against Core 2 Quad was its IPC. IPC was previously a hallmark of AMD’s Athlon/Athlon X2 when compared against Pentium 4/D, but today’s Phenom CPUs are simply behind Intel on a clock-for-clock basis.
Phenom II improves the situation a little, thanks mostly due to its larger L3 cache. AMD has also managed to incorporate a few tweaks that should directly improve the CPU’s number of instructions executed per clock cycle though. AMD has added path-based indirect branch prediction to improve the processor’s ability to handle branch instructions. We were also told that one algorithm for handling branches has been optimized to slightly improve branch prediction.
Phenom also boasts larger load/store and floating-point buffering. AMD wouldn’t provide specifics on how much larger the buffers are, but this tweak should improve missed buffer performance. AMD has also added floating-point register-to-register move instruction improvements into the processor.
The processing cores inside Phenom II can also probe their L1 and L2 caches twice as often as Phenom, effectively doubling core probe bandwidth. Enhanced pre-fetching allows Phenom II to recognize data access usage patterns and speculatively pre-fetch data instructions that are likely to be needed ahead of time into cache.
Phenom II also features improved LOCK pipelineing: under certain conditions, out-of-order CPUs like Athlon, Phenom, and Core 2 have to execute code in the order it was originally written. These CPUs normally like to take the instructions and reshuffle them in a way that maximizes the processor’s efficiency. Thanks to its improved LOCK pipelineing, Phenom II is able to once again shuffle these instructions more than previous AMD CPUs, locking down less of the pipeline and thus improving Phenom II’s efficiency. This really plays dividends when multiple LOCKs are in process simultaneously.
Finally Phenom II boasts lower latency to data stored in L3 cache than Phenom. The exact amount is open to debate, and it may not show up at all in some synthetic benchmarks, but AMD has attempted to reduce L3 latency. Phenom II’s L3 is more associative than Phenom as well, with Phenom II featuring 48-way associative L3 cache versus 32-way associative L3 in Phenom. This increases the L3 cache’s hit rate.