Hierarchical Z-buffer
HyperZ – Theory and application
Graphics guru Ned Greene originally developed hierarchical Z-buffering for quickly rendering scenes with very high depth complexities. The general concept of hierarchical Z-buffering is to use multiple Z-buffer resolutions to progressively determine object visibility. Originally, Greene used octrees to efficiently implement this.
Of today’s 3D graphics accelerators, only ATI uses hierarchical Z-buffering. While not the complete implementation that Greene originally proposed, it can offer considerable gains in texturing. ATI’s hierarchical Z-buffering is part of what they call HyperZ -- a few different hardware functionalities designed to reduce bandwidth requirements and increase rasterization performance.
The hierarchical Z-buffering implementation works by first keeping a reference Z value for every 8x8 pixel block of the Z-buffer. This reference value must be the deepest (furthest) value of all pixels in the block, with each block being determined by tiling the buffer. Keeping each value creates a low-resolution Z-buffer, which is then used for determining a rough visibility estimate.
Initially, the Z buffer is cleared, or filled entirely with values of zero. To see a benefit from the hierarchical Z, at least one object must be rendered. We’ll assume that a single triangle is rendered that covers the entire screen. For each 8x8 block of the triangle, the deepest Z value is kept as a reference value. As we go to render the next object(s), it is broken down into 8x8 pixel blocks. From each 8x8 pixel block, the furthest pixel Z value (ATI actually does this per-vertex with the new block) is compared to the reference value of the 8x8 block that exists in the Z-buffer, which is the render location for the new block. In this comparison, if the reference value on the existing block is found to be closer to the viewer than the new one, the new one is culled and the next block is compared. However, if the new block is determined to be nearer than the existing one, the new one must be rendered.
The primary issue, as some may have noted, is that if only a single pixel in the 8x8 block is nearer to the viewer than the deepest-existing one, the entire block must be rendered. This provides a loss in the total gain achieved by a hierarchical Z in comparison to early Z checking. On the other hand, with each reference value being stored on-chip (or a single value lookup per-block), the memory bandwidth requirements are dramatically lower.