The architecture
For R520, ATI’s design emphasis was dynamic looping/branching (flow control). ATI felt that this feature is one of the key new attributes found in Shader Model 3.0 (SM 3.0), and would be used by an increasing number of game developers in the SM 3.0 generation to not only make their games easier to write, but also to improve performance. With this in mind, ATI designed R520 accordingly, breaking down the pixel processing workload into a large number of small threads and managing it all with an ultra-threading dispatch processor, which acts as a central dispatch unit that tracks and distributes up to 512 threads across the RADEON X1800’s shader processors. (R520 also has dedicated branch units for each quad pixel shader core.) All this emphasis on flow control makes R520 more efficient than NVIDIA G70 at handling dynamic looping and branching, but SM 3.0 games that really make extensive use of this haven’t been released yet.
![ATI Radeon X1900 XT/XTX Performance Preview [ R580s pixel shaders @ 579 x 408 ] > View Full-Size in another window.](images/05-s.png) R580s pixel shaders
|
|
With their follow-up R580 part, ATI’s now turned their attention to another area: speeding up pixel shading. As has been widely reported leading up to the Radeon X1900 launch, the X1900’s R580 GPU is ATI’s first graphics chip to be equipped with 48 pixel processors. It’s important to note however that R580 isn’t a 48 pixel pipeline part. Starting with the R5xx series, ATI has begun to decouple the pixel shaders from the texture units, so the traditional “pixel pipeline” as we knew it for years in R300 and R420/R480 no longer applies. The following chart sums things up nicely:
| High-End GPU Comparison |
| Pixel Shaders | Vertex Shaders | Texture address units | ROPs | Max threads |
| Radeon X1900 XTX/XT | 48 | 8 | 16 | 16 | 512 |
| Radeon X1800 XT/XL | 16 | 8 | 16 | 16 | 512 |
| GeForce 7800 GTX | 24 | 8 | 24 | 16 | N/A |
| GeForce 7800 GT | 20 | 8 | 20 | 16 | N/A |
 |
ATI’s making a pretty bold statement with R580. With 48 pixel processors and only 16 texture units and ROPs, ATI feels that game developers in the future will increasingly use pixel shaders to create their stunning effects. After all, pixel shaders are not only growing more common, but also more complex, with an increasing number of shaders in games being arithmetic (versus texture) operations. ATI also argues that improving texture performance is also more reliant on subsequent improvements in memory bandwidth and size. With the fastest 900MHz GDDR3 memories in short supply and selling at a premium, improving the performance of texture operations isn’t as cost effective. The exorbitant $700+ street price and limited availability of the GeForce 7800 GTX 512MB is a good example of just how constrained the high-end GDDR3 memory market is right now.
ATI feels that focusing on pixel shading is the most cost effective way to dramatically improve R580’s performance: ATI states “By adding 20% more transistors, shader processing power is increased by 200%”. Of course, outside of anything but a synthetic benchmark, you won’t see R580 delivering three times the performance of R520 in real-world games, or a 200% performance improvement anywhere. But you can see what ATI’s thinking.
With R580, ATI’s looking towards the future, where shader-heavy games are the norm, rather than many of today’s titles, which are more dependant on texturing and raw fill-rate. In these types of applications, R580 will perform more like its predecessor. Once you crank up the AA and AF, these two cards will perform even more similarly.
ATI is also quick to point out that the Xbox 360’s Xenos GPU also has 48 shaders and 16 texture units (although in the case of Xenos, the pixel and vertex shaders are unified whereas R580 has 48 dedicated pixel shaders), so R580’s 3:1 ratio of pixel shaders to texture units isn’t new to game developers. In other words, don’t be surprised if the future may be become a reality sooner than you think.
On the vertex processing side, ATI carries over the R520’s basic design, with 8 vertex shaders. R580’s vertex shaders continue to lack support for SM 3.0’s vertex texture fetch feature, which allows vertex shaders to read from texture memory. To date, only 1C: Maddox Games has used the feature, which was recently added via beta patch to their WW2 sims IL-2 Sturmovik and Pacific Fighters to improve eye candy (specifically relating to the game’s water). An official patch is reportedly in the works from Maddox, although it remains to be seen if they’ll use ATI’s render to vertex buffer (R2VB) workaround to provide similar capabilities to ATI users. (For the record, we didn’t patch IL-2 to run this new mode in order to keep the playing field level between ATI and NVIDIA hardware.) ATI’s R2VB workaround uses the pixel shaders, furthering favoring the design changes introduced in R580.
The other notable change ATI has introduced into R580 is that the GPU has 50% more on-chip memory for ATI’s occlusion culling technology known as HyperZ. With more memory onboard, the Radeon X1900 is better positioned to tackle ultra high resolutions such as 2048x1536 or, for 30” LCD users, 2560x1600, particularly with AA.
In the X850/GeForce 6800 generation we noted that ATI had a decisive advantage over NVIDIA at 2048x1536, so perhaps ATI’s hoping for something similar at 2560x1600, unfortunately we don’t have a 30” LCD to test this theory.