Back to super-sampling
As discussed earlier, there are performance issues with super-sampling that leave us with the realization that a different anti-aliasing implementation is needed to be more efficient with today's level of graphics technology.
We know super-sampling works by averaging multiple sub-pixels to achieve each single output pixel. So in a given situation, you might be running at 800x600, using 2x2 (4x) anti-aliasing. Here you are effectively rendering at 1600x1200 and then averaging 2x2 grids of pixels to achieve an output. This is very expensive to perform because for every one pixel outputted we need 4 pixel inputs (which means four color values and four Z values). Here is an illustration of this:
So we see that the problem with super-sampling is that it simply needs too much fill-rate. You either require a separate pipeline for each sample (as in a multi-sample combining implementation) or multiple clocks to render all of the samples in each pixel. This is slow and inefficient.
Multi-sampling is just like super-sampling in that you are effectively rendering at four times your base resolution, yet here all of the sub-samples share the same color value of the original sample. What is different between each sub-pixel, however, is the Z value. Each sample therefore has a unique Z value. Here is an illustration of the sub-pixels when multi-sampling is used.
How do we do it?
To really understand multi-sampling we need to have an understanding of the pixel pipeline and how it operates when multi-sampling is in use. When looking at this illustration, keep in mind that there are a variety of possible implementations; this is simply a generic and somewhat optimal situation.
The multi-sampling pixel pipeline
Note that the initial stage of the pipeline does your typical texture lookup, retrieves your Z value, and sends your Z-slope down the pipeline. Included in this stage of the pipeline would also be anything involving texture combining for pixel shaders. In the next stage of the pipeline, sample coverage is determined in order to see which sub-samples fall within a given triangle. You then effectively multiply your screen by 4x the base resolution, and with the coverage information present we determine which sub-samples are rendered, and the Z-slope is used to determine where these sub-samples are positioned (assuming you are using a jittered value). Finally, each Z-unit calculates the Z value of each sample and then they are rendered.
Of course there are many different methods of doing multi-sampling. Some machines may do things in very different orders. For example, one might compute the Z values for each sub-sample very early in the pipeline or sample coverage might be computed later in the pipeline. Perhaps some machines might require multiple clocks as well. Everything is dependent on how the developer implements multi-sampling.