GT200 Architecture
More shaders
As shader-intensive, DirectX 10 games become more pervasive, the need for shading horsepower becomes more paramount. To accomplish this task, NVIDIA has increased the number of stream processors from 128 in G80 to 240 in the GeForce GTX 280, while the GeForce GTX 260 has 192. The following is a block diagram of G80 followed by the GeForce GTX 280:
![NVIDIA GeForce GTX 280/GTX 260 Performance Preview [ G80 Block diagram @ 655 x 523 ] > View Full-Size in another window.](images/07-s.jpg) G80 Block diagram
|
|
![NVIDIA GeForce GTX 280/GTX 260 Performance Preview [ GT280 Block diagram @ 1152 x 648 ] > View Full-Size in another window.](images/08-s.jpg) GT280 Block diagram
|
|
![NVIDIA GeForce GTX 280/GTX 260 Performance Preview [ Another GT280 diagram @ 1024 x 576 ] > View Full-Size in another window.](images/09-s.jpg) Another GT280 diagram
|
|
Each of the light green squares in the above diagram is a stream processor. If you’re patient enough to count every one of them, you’ll notice there are 240 stream processors total. The stream processors are organized into groups of streaming multiprocessors. Each streaming multiprocessor consists of eight individual stream processors. These streaming multiprocessors are then clustered in groups of three, with three streaming multiprocessors going into one texture processing cluster. There are ten texture processing clusters inside GeForce GTX 280. When compared to G80, GeForce GTX 280 has two additional thread processing clusters (10 versus 8), and each thread processing cluster has 3 streaming multiprocessors in GeForce GTX 280 versus two streaming multiprocessors in G80. Like G80, the stream processors run at their own clocks that are independent of the rest of the graphics core. In the GeForce GTX 280 for instance, the stream processors run at 1.296GHz, while the rest of the GPU runs at 602MHz.
GeForce GTX 280 also boasts improved threading. Whereas GeForce 8800 GTX was limited to a maximum number of 12,288 threads, GeForce GTX 280 supports a maximum of 30,720 concurrent threads in hardware. The thread scheduler dynamically load balances and is highly efficient, if a particular thread becomes stalled waiting for data, the GPU can immediately switch to another thread to process with no overhead.
With games increasingly using longer, more complex shaders NVIDIA has doubled the amount of register space in GeForce GTX 200. According to NVIDIA, GeForce 8 and 9 series GPUs were beginning to run into situations where these complex shaders would exhaust the registers, requiring the GPU to swap to memory. By doubling the size of the register file, these shaders can be run without having to do this, improving performance.
Improved texturing
One tweak NVIDIA integrated into G92 versus G80 was the addition of four additional texture address units. This allowed G92 to address 8 textures and perform 8 texture filtering ops/clock, previously it was 4 and 8 respectively. The end result was that GeForce 9800 GTX could address and filter 64 pixels per clock, whereas GeForce 8800 GTX was limited to 64 pixels per clock of texture filtering, and 32 pixels per clock of texture addressing.
With its two additional thread processing clusters, GeForce GTX 200 can address and filter 80 pixels per clock. NVIDIA also claims that the GeForce GTX 200 is capable of coming closer to its theoretical texture fill rates thanks to its more efficient scheduler.
Double-Precision Floating Point
As the GPU moves beyond its traditional 2D/3D/gaming workload to performing computationally-intensive scientific and financial computing functions, it’s very important that the GPU is capable of producing very accurate results. To achieve this NVIDIA has added double-precision floating point support to the GeForce GTX 200 series. Each streaming multiprocessor has its own double-precision, 64-bit floating point math unit, for a grand total of 30 FPUs on the GPU.
In comparison, GeForce 8/9 series GPUs were limited to 32-bit single-precision floating point.
512-bit memory interface
The GeForce GTX 280 is NVIDIA’s first GPU to sport a 512-bit memory interface. In particular, eight 64-bit memory controllers are used. With a wider, 512-bit path to memory, memory bandwidth is double that of GeForce 9800 GTX at equal memory speeds, but NVIDIA clocks the GTX 280’s memory slightly higher than the 9800 GTX, running at 1,107MHz. This equates to 141.7GB/sec of peak memory bandwidth. NVIDIA continues to use GDDR3 memory due to its lower latency, and the GTX 280 is outfitted with 1GB onboard.
For the GeForce GTX 260, NVIDIA deactivates one memory controller, resulting in a 448-bit memory interface. This is still wider than previous NVIDIA GPUs, including the 8800 GTX’s 384-bit memory interface. Onboard memory for GTX 260 is 896MB and peak bandwidth is 111.9GB/sec.
We’ve provided more details on the specific differences between the GTX 280 and GTX 260 on the next page.