128 Stream Processors
At the heart of NVIDIA’s new G80 GPU are its new shading units, NVIDIA dubs them “stream processors”. As we just mentioned they’re equally at home operating on vertex, pixel, physics, or geometry shading and controlling it all is the GPU’s dispatch and control logic, which can dynamically assign shading tasks for the most efficiency.
Stream Processors and GigaThread
For the past few years now, NVIDIA’s been analyzing the thousands of shader programs that have been used in today’s latest games. As a result of this analysis for instance, NVIDIA discovered that multiply-add (MADD) operations are one of the most commonly used math functions today and thus doubled the number of multiply-add instructions in each pixel pipe for G70, increasing throughput for the pixel shader.
For G80 the keyword is scalar computations. NVIDIA’s engineers found that they were becoming increasingly more common and that scalar computations were difficult to compile and schedule efficiently on a traditional vector-based GPU like G70. Therefore for G80’s shading processors (the so-called “stream” processors), NVIDIA incorporated a scalar architecture. Vector-based shader programs are then converted to scalar operations inside G80 to ensure efficiency.
Each stream processor can dual-issue a MAD and MUL instruction, and supports IEEE-754 floating-point.
Shown above is a block diagram of G80. Those little green squares in the above diagram are the stream processors and as you can see, each group of stream processors has its own dedicated texture address and filtering units as well as L1 cache. The stream processors are arranged into groups of sixteen. For ease of use we’ll refer to each of these group of sixteen as a “bank” of stream processors. With 16 stream processors per bank, and 8 banks total, that adds up to a total of 128 stream processors inside the GeForce 8800 GTX, while two banks are deactivated in GeForce 8800 GTS for a grand total of 96 stream processors in the GeForce 8800 GTS.
The stream processors run at their own clock speed that’s independent of the rest of the graphics core. In the GeForce 8800 GTX for instance, the stream processors run at 1.35GHz, while the rest of the GPU runs at 575MHz. If you recall, NVIDIA decoupled the clocks on G70/G71 as well, where the vertex shaders ran slightly faster than the rest of the graphics core.
GigaThread technology refers to G80’s use of threading. In all honesty both ATI and NVIDIA’s GPUs have supported this feature in the past, although it’s been widely regarded that ATI’s R5xx GPUs were capable of handling many more threads per pixel shader quad while also utilizing much finer threading than NVIDIA G70 and NV40.
NVIDIA doesn’t provide any specifics on how threading has been improved in G80, only to say that “thousands” of threads can be in flight within G80 at any given point. NVIDIA does boast finer thread granularity however, while ATI’s R580 provided a granularity of just 48 pixels, NVIDIA claims a 32 pixel granularity for pixel shader programs.
The rest of the GPU details
G80’s texture filtering units are completely decoupled from the stream processors and can deliver up to 64 pixels per clock for raw texture filtering (in comparison to 24 in G70/71), 32 bilinear-filtered pixels per clock, and 32 pixels per clock of 2X anisotropic filtering. GeForce 8800 GTX has six ROPs (the GTS has five), and each can render four pixels yielding 24 ROPs (effective) in 8800 GTX, and 20 ROPs (effective) in 8800 GTS.
We’ll discuss the memory subsystem in more detail on the next page, as it’s a little different between the 8800 GTX and 8800 GTS, but we can say that G80 continues to utilize a high-speed crossbar design with 64-bit memory controllers (for R520/R580 ATI uses 32-bit controllers, allowing the memory controller to serve more read/write requests simultaneously). The controllers themselves support DDR1, DDR2, DDR3, GDDR3, and GDDR4 memory types.
Quantum Effects
Before we discuss the differences between the 8800 GTX and GTS, and the cards themselves, we first wanted to briefly discuss NVIDIA’s “Quantum Effects” technology. This term merely refers to NVIDIA’s use of Havok FX for physics processing on the GPU. If you recall, Havok FX brings the world of physics processing to any shader model 3.0 (or greater) GPU, this includes the GeForce 6 and 7 series, although in those cases with lower performance. As we mentioned earlier, physics processing is handled by the G80 GPU’s stream processors, and not any dedicated physics processing unit.