Pixel shading
SMARTSHADER HD
SMARTSHADER HD takes 2.0 pixel and vertex shading to the next level in RADEON X800 by shattering some of the limitations of their previous architectures, particularly on the pixel shader side. We’ll start with vertex shading first though.
To go alongside its 16 pixel pipelines, ATI has equipped the RADEON X800 series with six vertex processing units. Each unit is able to handle two shader operations per clock cycle at full 32-bit precision. This is twice the processing capability of ATI’s RADEON 9800 PRO. ATI has also dramtically improved their performance in performing trigonometric calculations, specifically sines and cosines. Common examples ATI states are grass blowing in the wind and waves on the surface of water. These operations can be performed in a single cycle.
For pixel shading, each of the sixteen pixel shaders can handle up to five shader operations at once (two vector ops, two scalar ops and a texture address op). The sixteen pipelines are organized into four quad pipes, each quad pipeline is independent of the other with its own dedicated resources.
Like ATI’s previous DX9 architectures, all pixel shading operations in X800 are processed with full 24-bit precision, unlike the GeForce series which will switch between 32-bit and 16-bit precision. ATI feels that for now, 24-bit precision is all that’s necessary for today’s needs. Moving up to 32-bit precision would bring too much of a performance hit to existing hardware, including X800. And while NVIDIA has been quick to promote its use of 32-bit precision in its GeForce FX and GeForce 6 series of graphics cards, ATI has openly challenged NVIDIA to provide a real world example of where FP24 is insufficient in comparison to NVIDIA’s FP32 implementation in GeForce FX/GeForce6. In comparison, there are numerous examples where NVIDIA’s partial precision FP16 implementation suffers from banding and other artifacts.
Moving forward, ATI’s Andy Thompson, Director of Advanced Technology Marketing, has told us that ATI does plan on implementing FP32 at some point, but 32-bit precision doesn’t become feasible until ATI moves to 0.09-micron.
When ATI develops a part, they target a specific die size for the segment they’re targeting (value, mainstream, high-end) first, then cram the features they want into that die (via transistors) as efficiently as possible. As an interim process change, 0.11-micron isn’t practical enough for their next high-end part, instead Mr. Thompson mentioned that it would be used for future ATI value products (although not necessarily the direct successor to RADEON 9000/9200). Things don’t get interesting for ATI from a next generation performance perspective until they hit 0.09-micron.
Other pixel shading improvements ATI has implemented in X800 are an increase in the number of temporary registers available to pixel shaders. ATI has increased register space from 12 in RADEON 9800 series to 32 in X800 and a facing register has been added to allow effects that vary depending on whether a surface faces toward or away from the viewer.
If you recall, register usage is one of GeForce FX’s biggest weaknesses. While the RADEON 9800 PRO scaled well with increased register use, the number of instructions executed per cycle with GeForce FX 5900 Ultra reduced much more dramatically. This is why many applications cut the precision in half for GeForce FX cards, particularly those that use complex shaders such as those found in Half-Life 2.
ATI has also increased the maximum shader instruction count from 160 in previous architectures to 1,536 (512 vector, 512 scalar, 512 texture) in X800.
Finally, ATI has improved its F-buffer to better handle memory management issues with the original implementation found in RADEON 9800/XT. As a result the F-buffer can deal with very large chunks of data now whereas before they had to break it up into smaller pieces in order to get around the memory issues. This will allow X800 to more efficiently support shaders of unlimited length and complexity.