The Limitations of Caches
Why don't you just add more cache?
Unfortunately, caches can only do so much. While adding more and larger caches alleviate some problems, they create others. Caches must be invisible to software to maintain backwards compatibility, so they must operate completely in hardware and provide the illusion that the processor is speaking directly to the main memory. To accomplish this, processor architects have developed a number of algorithms that read ahead of the processor and keep the cache filled with the most relevant information, and dumps out data that is no longer necessary. Because of the speed of a modern processor, these functions must be implemented in hardware, adding complexity and cost to the processor and the memory subsystem.
Even the best cache algorithm is useless is some situations, such as multitasking. Modern operating systems use a process called virtual memory to provide every process with its own protected memory space. Each virtual memory address has to map to physical memory, which is accomplished by a part of the processor called the translation lookaside buffer (TLB). The placement of each virtual address in physical memory is determined by the operating system, but often the locations of data for different processes can be far away from each other in memory.
Flush that cache
In a multitasking environment, each program gets a slice of CPU time, usually around 5 milliseconds or less. When that time is up, the operating system takes over, and institutes a
context switch. The operating system saves the current state of the processor's registers in memory, changes the contents of the TLB, resets the registers to where the next program left off, and instructs the processor to begin executing the next program. Every time this happens, the information in the cache becomes irrelevant, and in some systems, the cache has to be completely flushed.
In these cases, the speed of the memory system is paramount. Needed data will not be available in the cache, and the processor will sit and cycle uselessly while the memory controller slowly requests the data. The processor's voracious need for information is no longer the only stress on memory.
Direct Memory Slappin'
The classic PC architecture allowed only for the processor to read and write to main memory. It quickly became apparent that using the processor to copy information from the hard drive to memory or from memory to the graphics card was a waste of resources. Intel introduced direct memory access, a standard that allows peripherals to read and write to memory through the memory controller, which is usually part of what is known as the north bridge chip. The north bridge connects the processor, the memory, and peripheral busses.
The AGP port allows a graphics card to read textures directly from memory, and with AGP 4x this operation can happen at 1.06GB/s. Likewise, the new IEEE 1394 high speed data port, commonly found on new systems and digital video cameras, can write up to 800MB/s to main memory. Considering that PC100 SDRAM has a maximum theoretical bandwidth of 800MB/s, and a real throughput of about half that, it's obvious that a new memory system is necessary.