New unified architecture
|High-end Specifications Comparison|
|Radeon HD 2900 XT||GeForce 8800 GTX||GeForce 8800 GTS 640MB|
|# of Transistors||700M||681M||681M|
|Core Clock Speed (MHz)||742||575||500|
|Stream Processor Clock Speed (MHz)||740||1350||1200|
|# of Stream Processors||320||128||96|
|Texture fill-rate (Gigatexels/sec)||11.9||18.4||12.0|
|Memory Clock (MHz)||1650||1800||1600|
|Memory Bandwidth (GB/sec)||105.6||86.4||64|
While R600 is technically a new design, the chip leverages many technologies already found in ATIís last two GPUs: R520/580 from the Radeon X1000 series and the Xbox 360ís Xenos GPU. Looking over a block diagram of R600, youíll immediately recognize many units from R5xx. For instance, to improve dynamic branching, ATI continues to break down the processing workload into a large number of small threads. These threads are then managed by the ultra-threading dispatch processor. Meanwhile from Xbox 360 ATI leverages their unified shading architecture.
For R600, ATI merely builds on this design, adding more powerful superscalar shader processors, a tweaked ultra-threading dispatch processor, the addition of a new tessellation unit (actually borrowed from Xenos) full DirectX 10 support, and a more robust, 512-bit memory interface. All this adds up to a GPU thatís been designed for the next generation in HD gaming, delivering very high levels of performance even when gaming at mega resolutions such as 2048x1536 and 2560x1600 with HDR+AA.
At the heart of all this is ATIís 2nd-generation unified shader architecture. Consisting of 320 distinct, independent stream processing units, itís quite impressive. Like NVIDIAís G80 GPU, ATI has incorporated a scalar architecture for R600ís shading processors (the stream processing units). Only in ATIís case, R600 can issue many more independent instructions in each shader processor due to its superscalar design. The Radeon HD 2900 can issue up to five scalar multiply-add (MAD) operations and one branch instruction to each shader processor per clock cycle. In comparison, each stream processor in G80 can dual-issue a MAD and MUL instruction.
In the block diagram above the stream processors are depicted as yellow squares in the center of the GPU. Attached to each group of five stream processors is a dedicated branch execution unit (the purple square) and general purpose registers which can be used to store input data, temporary values, and output data. Hereís a group of stream processors up close:
With more shading units onboard, itís important to keep these shader processors fed with data. This is where the ultra-threading dispatch processor comes in. The ultra-threading dispatch processor acts as a traffic cop, itís a central dispatch unit that is responsible for tracking and distributing thousands of threads simultaneously across the Radeon HD 2900ís shader processors. ATI wonít provide a specific max number of threads, but in comparison, R520/580 was limited to 512 threads.
The above diagram breaks down the ultra-threading dispatch processor. As you can see, ATI has provided separate command queues for each shader type: pixel, vertex, or geometry data. From there arbiter units determine which threads will be processed first, based on a variety of parameters. ATI provides two arbiter units per SIMD array, with dedicated arbiter units for texture and vertex fetches, allowing them to be scheduled independently of math operations. Threads that are already executing can be bumped at any time if a higher priority thread is pulled from the command queues. The temporary data is saved so the thread can be resumed later. If a thread is forced to wait for data, it is suspended and a new thread begins executing immediately. The suspended threads remain in the command queue until their requested data arrives. According to ATI, hundreds of threads can be queued up to make sure the SIMD arrays are never sitting idle.