Date: 10/06/1996
Forum: comp.sys.ibm.pc.hardware.video
I first want to say how rewarding it is to read all your
reviews after having worked on the design of Voodoo Graphics
(the chipset on the Orchid Righteous 3D board) for over two years.
I am one of the founders of 3Dfx and one of our goals was
to deliver the highest quality graphics possible to the PC gamer.
It was and still is a very risky proposition because of the cost
sensitivity of the marketplace. But your reviews help convince
me that we did the right thing.
I thought I would share with you a little bit about what is
inside the 3Dfx Voodoo Graphics chipset. There are 2 chips
on the graphics board. Each is a custom designed ASIC containing
approximately 1 million transistors. Although this number of
transistors is on the order of a 486, it is a lot more powerful.
Why? Because the logic is dedicated to graphics and there’s a
lot of logic to boot. For example, bilinear filtering of
texture maps requires reading four 16-bit texels per pixel (that’s
400 Mbytes/sec at 50 Mpixels/sec) and then computing the equation
red_result=r0*w0+r1*w1+r2*w2+r3*w3 where r0:3 are the four red
values and w0:3 are the four weights based on the where the pixel
center lies with respect to the four texels. This is performed
for each color channel (red, green, blue, alpha) resulting
in 16 multiples and 12 additions or 28 operations per pixel.
At 50 Mpixels per second that is 1,400 Mops/sec. The way this
is designed in hardware is you literally place 16 multipliers
and 12 adders on the chip and hook them together. And this is
only a small part of one chip. There are literally dozens of
multipliers and dozens of adders on each of the two chips dedicated
only to graphics. Each chip performs around 4,000 million actual
operations per second, of which around one third are integer
multiplies. These are real operations performed - if you were to
try to do these on a CPU (or a DSP) you must also do things like
load/store instructions and conditions. In my estimation it would take about
a 10,000 Mip computer (peak) to do the same thing that one of our
chips does. This is about 20 of the fastest P5-200 or P6-200 chips
per one of our chips. Not exactly cost-effective. So if you want
to brag, you can say your graphics card has approximately the same
compute power as 40 P5-200 chips. Of course, these numbers are more
fun than they are meaningful. What is meaningful in graphics is
what you see on the screen.
Now of course, if you were writing a software renderer for a game,
you wouldn’t attempt to perform the same calculations we perform on
our chip on a general purpose CPU. You would take shortcuts, like
using 8-bit color with lookup tables for blending, or performing
perspective correction every ‘n’ pixels. The image quality will
depend on how many shortcuts you take and how clever you are.
Voodoo Graphics takes no shortcuts and was designed to give you
the highest quality image possible within the constraint of 2 chips.
As your reviews have shown, it is evident that you can see the
difference in quality and performance.
Now I am sure the subject of triangle setup and geometry calculations
will come up sooner or later in this newsgroup. Let me make a
preemptive strike and answer your questions before you ask them.
There is no geometry acceleration on the board, where geometry is
defined as geometric transformation and lighting. In the Wizard’s
tower demo you see lighting being applied to texture maps through
the use of a ‘lighting map’, which is another texture map that
contains the results of off-line radiosity calculations. This is
not traditional lighting in the OpenGL sense, but is nonetheless a
very powerful method of performing static lighting. It is becoming
more popular with games and personally, I think its great! It
requires bilinear filtering AND high fill rates, both of which
the R3D card has.
Now back to triangle setup. The Voodoo Graphics chipset performs
about 2/3 of triangle setup in hardware. When designing Voodoo
Graphics I carefully studied exactly what triangle setup our design
required and we placed things that were hard for a Pentium to
perform in hardware and left things easy for a Pentium to perform
out of the hardware design. Our triangle engine is also very
efficient in that it requires less setup than most (I worked 9 years
at SGI and have a lot of triangle engine experience). The net
result is that the 1/3 of triangle setup we perform on the Pentium
is not many cycles at all. That is why our triangle numbers are so high.
With an efficient design, you can afford to use the Pentium to perform
some of your triangle setup. With an inefficient design, you cannot.
I know this is a very controversial subject, so I will stop right here.
I hope this answers some of your questions in advance. Thanks for
buying the board and I hope you enjoy it. As for the flight sims,
I am waiting for one too, and am anxiously awaiting to see what our
chips can do. I wrote the original SGI flight simulator and hopefully,
I won’t have to write another one :-}