HyperZ III/Anti-Aliasing Performance
Vertex Shader (cont’d)
With static T&L having been considered, what of vertex shader calculations? Well just as with a pixel shader, vertex shader code complexity has a major impact on performance. While R300’s vertex shader is capable of running approximately 65,000 instructions; vertex shader performance can actually be reduced to a crawl when compared to even the most complex pixel shader programs. The following performance numbers are from 3DMark 2001 SE’s vertex shader performance test:

For our final T&L benchmark we will consider real-world performance. To do this, we ran Comanche 4’s benchmark demo and recorded the T&L performance results (taken in millions of vertices/second). As noted in the following chart, real world performance is considerably lower than our theoretical benchmarks.

HyperZ III
The primary gain found with HyperZ III is from R300’s hierarchical Z-buffer. In the first article we discussed exactly how this operates, and in doing that we noted how scene rendering order was critical for the use of it. The reason behind this is quite simple, in that the detection algorithm relies on what has already been rendered to detect if the next pixel will be visible. If the scene renders back-to-front we find that every new pixel will come closer to the viewer, so no pixel can ever be culled. On the other hand, if the scene is rendered front-to-back, the nearest layer is rendered first, thus allowing for the detection of all non-visible pixels behind this front layer.
Using Humas’ GL_Ext_reme benchmark, we are able to examine the performance of different rendering orders to see the benefits associated with each. This benchmark examines front-to-back, back-to-front and random ordering. As we can see, the performance advantages associated with front-to-back ordering with use of the hierarchical Z-buffer can be quite considerable.



SmoothVision 2.0
With R300’s use of multi-sampling and both color and Z-buffer compression, anti-aliasing no longer has the performance impact that it once did. In the days of super-sampling, there was little getting around the 75% performance loss, as it was all eaten up in fill-rate. With multi-sampling requiring only minimal additional fill-rate, the performance loss associated with such can be dramatically reduced. The following charts show the performance of anti-aliasing with 3DMark’s Complex Race Scene:


