Prescott Cache, Core, and SSE3
Caching In
The Pentium 4 Extreme Edition is proof that Intel doesn’t need its 90nm process to manufacture chips with lofty transistor counts, but it does play an integral role in making complex processors profitable. The largest contributor to Prescott’s massive transistor increase is its enlarged caches. Intel left the instruction trace cache well enough alone, but it doubled the L1 data cache from 8KB to 16KB set associative eight ways. Moreover, Prescott’s L2 cache is now 1MB rather than 512KB. Connected through a 256-bit bus and running at full processor speed, the L2 cache theoretically has more than 102GB of bandwidth at its disposal.
![Intel Pentium 4 Prescott 3.2GHz & Pentium 4 Extreme Edition 3.4GHz Reviewed [ Prescott's die @ 800 x 769 ] > View Full-Size in another window.](images/06-s.jpg) Prescott's die
|
|
Modifying the Core
One attribute of the Prescott core that you won’t be seeing on any of Intel’s marketing material is its deeper execution pipeline. Now, it isn’t that employing a deeper pipeline is bad, but it does have an effect on the number of instructions that can successfully be executed per clock cycle, especially with an inefficient branch predictor. Fortunately, Intel claims to have enhanced both the static and dynamic branch prediction algorithms. Nevertheless, Prescott’s new 31-stage execution pipeline does have an adverse effect on performance, as you’ll see in the benchmarks.
Why, then, did Intel change the pipeline? Think back to November of 2000, when Intel first unveiled the Pentium 4 running at 1.5GHz. Although it seemed significantly faster than the 1GHz Pentium III, the Pentium 4’s 20-stage pipeline precluded it from significantly outperforming its competition. Look where it’s at today, though. At 3.4GHz, the Northwood processor is much faster than that first Pentium 4 and all because the deeper pipeline gave Intel the headroom it needed. Apparently, Intel’s engineers are confident that the revised NetBurst micro-architecture will scale to 4GHz by the end of 2004.
![Intel Pentium 4 Prescott 3.2GHz & Pentium 4 Extreme Edition 3.4GHz Reviewed [ Prescott layout @ 704 x 457 ] > View Full-Size in another window.](images/07-s.jpg) Prescott layout
|
|
SSE3
It was hard to speculate on the effect SSE2 would have during the 1.5GHz Pentium 4 launch. After all, MMX never really took off, and given the normal development cycle of software, it’d be at least a year before properly optimized titles started emerging. But as it turned out, SSE2 really made a difference – if you want concrete proof, look back at how AMD’ Athlon 64 scored in Content Creation 2003 before and after it was patched for proper processor recognition.
SSE3 is a much smaller extension of the IA-32 ISA, totaling 13 instructions and intended to improve performance in complex arithmetic, video encoding, graphics, and thread synchronization. Intel’s C++ Compiler for Windows 8.0 already supports SSE3 optimizations, making it easier for developers to start employing the new instructions.