[ Print Article! ]

How It Works: Fragment Anti-Aliasing
December 04, 2002 Dave Barron

Summary: By now we've all heard of the various anti-aliasing techniques employed by the latest video cards. Multi-sampling is currently the implemenation of choice for ATI and NVIDIA, but what about fragment level anti-aliasing such as that used by the Matrox Parhelia? In this article, Dave explores the advantages and pitfalls of Fragment AA. See how it works in today's article!


IntroductionPage:: ( 1 / 6 )

It is often true that many paths can reach a destination. This is true in life with reaching success both financially and with members of the opposite sex, where different people will apply different approaches to doing so. Yet, while different paths are taken, all are potentially successful. But this isn’t a site about dating and finances, now is it? We are here to discuss hardware and games, so how does this relate?

Achieving a certain result in chip design can be accomplished by a variety of means. Pixel shaders, for example, can be approached from many different fashions. Not only that, but each method used can be configured in very specific ways for die size considerations and operation optimization. However, in this discussion we will not be looking at pixel shader architecture. Rather, today, we will discuss the use of coverage masks in fragment level anti-aliasing algorithms.

@serve_inline_ad( $current_section ); %>

How does one relate taking different roads to anti-aliasing? Well just as with the pixel shader example that was mentioned, anti-aliasing can also be effective by using a variety of different techniques. With each method, the end result is effectively the same, yet each carries with it a set of advantages and disadvantages.

While this article is to focus on fragment level anti-aliasing algorithms, it is important for us to understand multi-sampling implementations as well. With such systems becoming standard on all NVIDIA hardware, and ATI now using it in their high-end RADEON 9500 and RADEON 9700 boards, it is quickly becoming the choice anti-aliasing technique of the graphics industry. Understanding this approach will allow comparisons to be made with fragment level approaches, and thus the determination of what implementation is most desired.

Multi-Sampling Explained

Multi-Sampling algorithms share many characteristics of super-sampling. Both render a scene to a high-resolution buffer and filter down to achieve the anti-aliasing effect. Yet, that is where the similarities end.

For this consideration, we will assume 4x anti-aliasing is being used. While multi-sampling can be implemented in several different fashions, we will only consider one such method.



SIDEBAR: 3dfx’s Voodoo5 truly introduced the world to anti-aliasing. The Voodoo5 utilized super-sampling.



Multisampling/Fragement AlgorithmsPage:: ( 2 / 6 )

Displaying the final image

The first step in multi-sampling is to consider the scene theoretically double in size in both the horizontal and vertical (thus, the picture is a total of four times larger). A triangle edge mask is used to locate those pixels that fall along each triangle edge. Edge pixels take a unique color sample for every sub-pixel, writing four separate color and Z values to their respective buffers. On the other hand, pixels not located on a triangle edge will all share the same original color value, writing four identical color values to the color buffer. Every depth (Z) value will be independently calculated.

On finalizing the scene the image is down-sampled, often using a run-of-the-mill bilinear filter. The four color values are averaged into a single pixel, resulting in the displayed pixel. With the edge pixels having had each sub-pixel uniquely sampled, the filtered value will be an average of the four samples that fell along the edge, creating a more accurate, and anti-aliased value. With this completed for the entire frame, the buffer is flipped and the image is displayed.

This, of course, is just a general overview of one multi-sampling implementation. The actual process is slightly more complex, but for our purposes here, this level of understanding will suffice.

The advantages of multi-sampling over super-sampling are two fold. Multi-sampling requires little additional fill-rate in the implementation we discussed, while in other implementations it can require no greater fill-rate at all. Additionally, with all internal pixels requiring only a single pixel sampling (i.e. A bilinear filtered pixel requires 4 samples, where a super-sampled bilinear filtered pixel requires 16 samples), the result is just a slight increase in texture reads.

On the negative side, buffer storage requirements are extensive, as is bandwidth consumption. Just as with super-sampling, color and z-buffers because four times larger than at the selected resolution. A linear increase in bandwidth consumption takes place, with larger buffers requiring greater bandwidth.

Fragment Level Algorithms

Fragment level algorithms do not work on a sub-pixel level, but rather on a fragment level. A sub-pixel is effectively an entire pixel of its own, whereas a fragment is simply a segment of a complete pixel. A sub-pixel will store a full color and Z value, where a fragment will only store information regarding a segment of a complete pixel.

Buffer sizes do not increase with fragment level algorithms, as the number of pixels dealt with is exclusively dependant on the selected resolution. A variation in storage is only found in that the fragment data must be stored in a separate buffer. This storage requirement is notably less than with multi-sampling, as similar anti-aliasing levels can be achieved with relatively few fragments.




SIDEBAR: NVIDIA nearly stole some of 3dfx’s thunder with their Detonator 5 release. One unsupported feature was anti-aliasing!


Fragment Level AA ExplainedPage:: ( 3 / 6 )

When rendering a scene while using fragment level anti-aliasing (such as that used by Matrox Parhelia-512) one basically performs all operations normally. The scene is rendered and all is well. The variation occurs in that an additional stage is added to the pixel pipeline. This stage uses a coverage mask on each pixel, with each section of the mask being a fragment. When laid over an edge pixel, it is determined as to whether or not each fragment falls within the triangle in question. If the fragment is found inside the triangle, it is assigned a value of one, where if the fragment is outside of the edge, the assign value is zero.

[image]

<% print_image("01"); %>


With the fragment level data having been determined for a pixel, this data is carried along the pipeline as the pixel is rendered. When the pixel is completed, it is written as a normal pixel, without any anti-aliasing. The fragment data is written to a fragment buffer and stored for later use.

Filtering must take place after completion of the scene and this is where the fragment data comes into play. With numerous filter types existing, the exact method used is entirely up to the engineers designing the system. Any filter shape can be used, such as a box, X, or crossed shape, or even the ever popular quincunx pattern.

The filtering process takes place by making note of what percentage of the pixel is within the triangle and what percentage is outside of it. The side(s) of the pixel that are in and out of the triangle must be considered as well to properly filter.

With the filtering information available, the pixel in question is blended with neighboring pixels. While technically blurring, and thus not ideal, one might consider this a “smart blur” in that it only does so along edges so as not to distort the image and blurs at a calculated level with the pixels that are calculated to be used for such. We will discuss the level of quality actually delivered later in this article.

Specific Techniques

There are a variety of ways to implement a fragment level anti-aliasing algorithm. For example, Matrox has their proprietary implementation (which they simply coin FAA), as well as Bitboys with their MatrixAA. Other implementations exist, though none have seen the light of day in hardware. With that said though, we will discuss a couple of implementations so as to better understand how the different algorithms function.




SIDEBAR: Matrox was founded in 1976, seventeen years before NVIDIA was born.


Matrox’s Fragment AAPage:: ( 4 / 6 )

Our first implementation (from Matrox) operates by buffering on a scanline basis. It is likely that Matrox does something similar to this for they provide a 128-pixel buffer on their chip. As each scanline is rendered one at a time, the edge pixels are located and written to an on-chip buffer, thus creating a fragment list. This list stores the location of the edge pixels, as well as additional Z, color, and alpha data.

One can only speculate what Matrox does with the additional data stored within the fragment buffer, as they will not reveal it for patent reasons. The additional Z data is likely for the portion of the pixel that falls outside the original triangle. With calculating the percentage of a pixel within a triangle, it is logical that the section within the triangle will provide a different depth value than the section outside of the triangle. Without this additional data it is possible that the edge would mistakenly read the neighboring triangle, providing false coverage information, producing artifacts.

The reason for providing the additional color and alpha data is a question that remains somewhat less understood. Perhaps it is there to provide additional blending information, or perhaps the anti-aliasing filter is unable to access the depth and color buffer, requiring all edge data (color, alpha, Z) to be stored in the fragment buffer. Whatever the case is, each edge pixel’s data is stored within the fragment buffer.

Potential Setbacks

Now this presents an interesting situation. First and foremost, the question of buffer size becomes an issue. What happens if a certain scanline has more than 128 edge pixels in it? Perhaps Matrox has implemented the ability to write portions of the scanline, move it to external memory, flushing the cache and then resume the scanline.

If so, this is obviously going to stall the chip, reducing performance. However, if the scanline has less than 128 edge pixels in it, certain pixels will not achieve anti-aliasing for they will not fit into the buffer.

Now it was noted that only edge pixels store fragment data. This can potentially create a problem when a triangle overlaps another triangle. In such a situation a new edge is created on the existing triangle. With no available coverage data for those non-edge pixels, anti-aliasing will likely not occur. This is perhaps why it has been noted that certain edges seemingly receive no anti-aliasing with Matrox’s algorithm.

Yet this is but a mixture of facts and speculation. The following are the list of facts on Matrox’s FAA:

· Includes128-pixel cache
· Fragment Buffer stores fragment data, as well as additional Z, color and alpha data
· Fragment buffer size varies from frame-to-frame
· Stores 16-fragments per-pixel (not known if this is a normal or staggered mask)



SIDEBAR: Matrox recently unveiled a 128MB version of its Parhelia card.


Alternative FAA Techniques/Performance & QualityPage:: ( 5 / 6 )

Alternatively, one might store coverage data for the entire scene in local memory. Such a method would place a coverage mask over every available pixel, storing the data for each pixel, even if no edge exists. This likely increases buffer storage requirements, as every pixel requires a stored value, when compared to Matrox’s exclusively storing edge pixels.

This alternate implementation offers a couple of key advantages. First and foremost, every pixel has coverage data available, thus solving the issue with certain edges remaining aliased. This is done by reading back the pixel value and recalculating the edge. Additionally, there is no concern of buffer overflows, which would either result in additional aliased pixels or pipeline stalls.

Performance

Fragment level algorithms hold a key advantage in performance over multi-sampling and super-sampling algorithms. Where super-sampling requires 75% more fill-rate and bandwidth and multi-sampling requires nearly the same level of bandwidth, fragment level algorithms require no additional fill-rate, nor do they consume such relatively high levels of bandwidth.

For example, with super-sampling and multi-sampling, Z and Color buffers increase in size by 300% each. So, for example, if you have a 1MB color buffer without anti-aliasing, super and multi-sampling will increase this to 4MB each. Thus, with Z and color buffer, a 2 MB storage requirement is increase to 8 MB. Yet, in the same situation with a fragment level algorithm, one must simply store an additional 1 MB buffer. Storage requirements then would increase from 2 MB to 3 MB.

We can thus see that performance is minimally impacted with fragment level algorithms, with no fill-rate reduction and minimal bandwidth increases. The only performance loss that will then be associated with such an algorithm are from any potential pipeline stalls, as well has having to filter the final image.

Quality

Examining the quality of a fragment level algorithm is rather tricky. Super-sampling’s quality increase is very linear in that if you set an anti-aliasing level you know exactly what to expect. On the other hand, fragment level algorithms can provide a variety of different results.

On a typical edge, such as that of a wall, a fragment level algorithm is going to provide a very high quality result. It will often be noticeably superior to that of both super and multi-sampling for the pixel in question has been more accurately calculated and a greater blend is produced. However, as potentially with multi-sampling, certain edges will receive no anti-aliasing. These edges include alpha blended surfaces where algorithms typically are unable to detect the edges, as well as surface that are simulated by textures (such as leaves on trees). Additionally, fragment level algorithms are unable to anti-alias triangle intersections. While more often than not, this is not an issue, it is something that the trained eye will notice. Consider the following two images to get an idea of the level of quality provided by a fragment algorithm. The first is of Bitboys’ MatrixAA and the second is that of Matrox’s FAA implementation.

[image]

<% print_image("02"); %><% print_image("03"); %>

Finally, the last obstacle of fragment level algorithms is that of blurring. While generally not a problem, it can become one when near pixel size triangles are used. For example, if a group of 1-2 pixel triangles are joined together and the filter is used over all of them, texture detail will be lost to do each pixel being an edge and thus requiring filtering.




SIDEBAR: Tuan took some screenshots of 16x Fragment AA in his review, but to be honest the best way to judge AA quality really is to sit down and look at it. Also keep in mind that it can be subjective.


ConclusionPage:: ( 6 / 6 )

Final Thoughts

The two approaches being used at present are multi-sampling and fragment level algorithms. Certain low-end parts continue to provide super-sampling exclusively as it requires no additional hardware, but with the next generation of graphics processors we can expect this to likely phase out, with all parts adopting multi-sampling or fragment level algorithms.

With any algorithm there will obviously be tradeoffs. Super-sampling requires very high levels of fill-rate; multi-sampling alleviates this requirement but also requires the use of high levels of bandwidth (though this is getting better by means of data compression). Fragment algorithms alleviate any performance issue and often deliver high quality levels, though certain surfaces will not achieve the desired anti-aliasing result.

Matrox is the big pusher of fragment level anti-aliasing today. At present, they still have some issues with their implementation, yet rumors have it that this will be corrected in their next-generation part (though recent reports indicate this part will never even see the light of day). If so, it is certainly a worthwhile feature.

As for Bitboys, we are aware that they have a fragment level anti-aliasing algorithm called MatrixAA, yet no information has been provided as to the exact functionality. Due to this, it is difficult to even speculate on how they operate internally.

Whether or not one decides to go the fragment route, it is certainly an interesting alternative to have available. Even if not used as the primary anti-aliasing method, there is nothing to stop fragment anti-aliasing from being left as a low-cost alternative for the most demanding applications, or even allowing it to be used in conjunction with other techniques. Without question, this approach is a viable alternative worth consideration by all.




SIDEBAR: Is fragment AA the way of the future or will multi-sampling continue to dominate? Talk with others in the news comments!

© Copyright 2003 FS Media, Inc.
[ Print Article! | Close Window ]