Early Z Checks
An ounce of prevention…
Early Z Checking (also know as Early Z Out) is a fairly basic idea that helps to work around one of the inefficiencies of a traditional rendering pipeline. In a traditional pipeline, texturing operations occur with the final result, typically having a pixel written to the color buffer. To determine if this write should occur, a depth (Z) test is performed to determine whether or not the pixel is visible. If the Z value of the new pixel is less than that of the buffered value, it is determined that the pixel is visible and therefore should be written to the buffer. If the Z value is greater than the buffered value, the pixel is occluded and it is thus ignored. This shows a lack of efficiency because the pixel’s visibility is determined after the pixel’s color value is finalized, thus requiring the use of texture bandwidth and fill-rate.
Early Z checks add an additional Z compare early in the pixel pipeline -- before texturing operations are performed. This initial Z compare tries to determine early on as to whether or not this pixel is visible just as a traditional Z compare does. If it is able to determine that the pixel is not visible (by reading from the Z-buffer that an existing pixel exists with a nearer depth), the pixel is culled and texturing operations are not performed.
Ordering matters
When a graphics scene is rendered, often times it is rendered in a specific order. Some applications will render the scene in a back-to-front order, where objects that are furthest away being rendered initially, and progressively rendering objects nearer and nearer to the viewer. This is the worst case for a 3D accelerator typically because a color write must occur for every pixel at every depth. Other applications will use a front-to-back render order, where objects that are nearest the viewer being rendered first. This situation is optimal for hardware, as it only requires a single color write for each pixel location. The final situation presents a random order rendering, where scene objects come down the pipeline in a random order. This is a sub-optimal, though better than back-to-front as only some pixel color writes must occur more than once.
For early Z checking, front-to-back ordering is optimal, just as on a traditional pipeline. By using a front-to-back order, all visible portions of the scene are rendered immediately, allowing the early Z compare to read the needed depth values and determine that any pixels that are not rendered at the nearest level are not visible and can thus be culled. Random order rendering allows for some values to be occluded, but in such a case it is impossible to determine the exact level of gain, as it can actually vary from scene-to-scene. Any application that uses back-to-front as a rendering order will not see any gain from early Z checking. In such a case if hardware does not disable early Z checking (assuming that it was a bandwidth limited product, as all current hardware is today) there would actually be a loss in performance. Why is this?
Just as on a traditional rendering pipeline, hardware that supports early Z checks must perform a Z compare late in the pipeline as well. This is true for a variety of reasons, such as the possibility of a pixel shader modifying the Z value of a pixel. Another possibility is that an application might modify the depth value of a texture using the TexDepth function in DX8.1 Pixel Shaders 1.3 and 1.4. In such a case, hardware would not only need a late Z compare, but it would also want to disable the early Z compare of the primitive associated with the TexDepth modification.