Introduction
The Memory Gap
If you asked a group of gamers what is the most important part of a computer is after the CPU, they would most likely say the graphics card. If you asked a CPU designer, however, he would probably say the memory subsystem. For most of the history of the PC, even the most informed hardware enthusiasts have not had much to say about the type of memory placed in their system, only the amount. That's about to change. Following their own lead from the Socket 7 vs. Slot 1 vs. Slot A debacle of the last few years, Intel is charging ahead with a plan that is all-but guaranteed to fracture the memory market with incompatible systems. Once again, Intel is willing to pit its processor and chipset divisions against the combined power of AMD and chipset maker VIA Technologies. The fight is certainly not going to be boring.
Why?
The big problem with memory is that, over the past decade, memory speed hasn't been able to follow Moore's law. Named after Intel founder Dr. Gordon Moore, Moore's law predicts that the complexity and power of microprocessors will double every eighteen months. For the past twenty years, the law has been remarkably true. Unfortunately, the memory system that processors rely on has not advanced as quickly.
The first processor to confront this speed limitation was the 486DX2 series CPU, which ran at twice the speed as its front side bus (which connects the processor to the memory). The 2X clock multiplier was somewhat controversial at the time, since some analysts said that the DX2 would run much faster than it could fetch data from the memory, and they were partially correct: the doubling of internal clock speed only showed a performance increase of about 1.5x. However, since faster memory technologies than regular DRAM were not yet ready, Intel blazed on the Moore's law path by increasing multiplier speed. Today's Katmai Pentium III-600 runs at a 6x multiplier, meaning the processor executes six cycles in the time the memory cycles once!
How can they do that?
Since a read request generally takes two or three cycles to be processed and fulfilled, there are obviously some other technologies supporting this gap between processor and memory performance. One of these technologies is the multi-layered cache design of modern systems. Cache is a small amount of extremely fast RAM that is physically and logically closer to the CPU than regular memory. To understand how a cache works, you first have to understand how a CPU functions. The following is a short version of the explanation found in our G4 and IA-64 articles.
CPUs execute large numbers of instructions in sequence when running a program. There are around five or six steps to decode each instruction, depending on the hardware and the complexity of the instruction. Modern CPUs are pipelined and superscalar, meaning they are able to work on several steps and instructions in one cycle. The result of this is that, instead of taking five cycles to complete an instruction, a modern processor averages between 1 and 2 cycles per instruction. Sometimes they can complete multiple instructions in one cycle! This places more pressure on the memory to feed the processor a sufficient stream of instructions and data.