New architecture
According to NVIDIA, work began on the G80 GPU powering the GeForce 8800 GTX and GeForce 8800 GTS in the summer of 2002. G80 is a massively parallel, unified shader design which combines NVIDIA’s stream processors with their so-called GigaThread technology to deliver these new levels of performance. Unlike the GeForce 7900 GTX, this is a completely new architecture designed from scratch for DirectX 10. In fact, NVIDIA boasts that the GeForce 8800 GTX delivers 2X the performance of the GeForce 7900 GTX in current application, and up to 11X the performance in certain shading operations. Let’s go over some of the key features in this new architecture.
Unified shader architecture
In previous graphics architectures, both ATI and NVIDIA incorporated a number of distinct pixel and vertex shading units which were dedicated solely to that particular task only. Pixel shaders worked on pixel operations, while the vertex shading units were dedicated solely to dealing with vertices. There was no mixing or sharing work between them.
This was done because in previous versions of DirectX, pixel and vertex shaders weren’t created equal. Pixel and vertex units worked on separate instruction sets that were tailored to those specific applications; pixel and vertex shaders for instance supported different instruction limits and constant registers.
Under DirectX 10 however, all shaders rely on the same instruction set and support the same number of registers and inputs. Each shader is a general purpose programmable floating-point shader, whether its pixel, vertex, or geometry. In other words, no one shading unit is more functional than the other. This allows the shaders to operate on any type of data, whether it’s a pixel program, or a task for the vertex shader. As a result, both performance and efficiency increase.
Take the example above for instance. Under previous graphics architectures, the vertex shaders in the top scenario are heavily taxed, while the pixel shaders are basically idling. Considering the 3:1 ratio of pixel to vertex shaders in previous NVIDIA GPUs, this equates to a large portion of the GPU’s shading engine essentially being unused!
In the bottom (water) scenario it’s the exact opposite, the vertex shaders are idling while the pixel shaders are working full tilt.
Under a unified architecture, the shading units can work on any task, so if it’s pixel-intensive the shaders can be assigned accordingly or vice versa if it’s vertex-intensive. This can lead to substantial performance and efficiency gains, and just as importantly, it’s invisible to software developers. GeForce 8800’s unified shaders can also be used seamlessly with DirectX 9 and older DirectX versions, as well as OpenGL.
Before we go further, it’s important to note that unified shader architecture is not a requirement of DirectX 10. Technically, DX10 only requires a unified instruction set. But according to NVIDIA: “GeForce 8800 engineers believed a unified GPU shader architecture made most sense to allow effective DirectX 10 shader program load-balancing, efficient GPU power utilization, and significantly improved GPU architectural efficiency.”