Summary: Since successfully launching GeForce 6800 Ultra just over a year ago, NVIDIA's been tackling other challenges. Last summer it was the GeForce 6600/6600 GT, then last winter it was SLI. Now NVIDIA is poised to unveil their second-generation shader model 3.0 part, GeForce 7800 GTX, and it's been built to deliver impressive frame rates. But performance isn't the only new item with this new card, as NVIDIA also ups the image quality factor by a notch or two. See how the 7800 GTX stacks up against its competitors in today's article!
The evolutionary progression of GeForce
In all other situations with benchmarks of that time, the GeForce 3 performed roughly on par with GeForce 2 Ultra.
By the time GeForce4 arrived, DX8 applications were finally becoming more common, although with Quake 3, Serious Sam, and Unreal Tournament dominating the shelves, there were still only a handful of programmable titles on the market. With GeForce4, NVIDIA largely built on GeForce3, adding an additional vertex shader for pumping triangles, and making optimizations to the pixel shader and memory subsystem. NVIDIA spiced things up even further by cranking up the clock speeds, particularly on the GeForce4 Ti 4600. GeForce4 cards, particularly the Ti 4200 and Ti 4600 were incredibly popular; to this day, over three years since they were initially released, there are still quite a few of these cards in everyday use.
NVIDIA then stumbled with DirectX 9. Their competitor, ATI, beat them to market with RADEON 9700, and their initial DX9 part, GeForce FX 5800 Ultra, performed slower than the 9700 and ran loud and hot. Ultimately the product was scrapped before it really got off the ground, with NVIDIA introducing GeForce FX 5900 instead. The 5900 somewhat improved the situation for NVIDIA, but it still trailed ATI in performance with DX9 titles. We noted this in our GeForce FX and DirectX 9 Half-Life 2 Performance article, where the 5900 was consistently outgunned by less expensive mainstream DX9 ATI cards like the RADEON 9600 XT.
NVIDIA really got things right with GeForce 6 however. NVIDIA beat ATI to market, delivering shader model 3.0 hardware well before their Canadian competitor (which still hasn’t introduced a shader model 3.0 part, by the way), and in better quantities on the higher-end parts at retail. Also working in the GeForce 6’s favor was software support. Whereas DX8/DX8.1, and DX9 software trailed the hardware by over a year, applications that took advantage of shader model 3.0 were available within months of GeForce 6800’s launch, bucking the trend of software being slow to catch up to the hardware. Crytek even integrated GeForce 6’s OpenEXR high dynamic range lighting into Far Cry.
With dozens more shader model 3.0 titles set to be introduced between now and the end of this year, NVIDIA is now poised to introduce their second generation shader model 3.0 card, GeForce 7800 GTX, formerly codenamed G70. The new GPU boasts better performance and improved image quality over its predecessor. Let’s explore how NVIDIA accomplishes this…
NVIDIA CineFX 4.0 Shading Architecture
24 Pixel Shader Units
Next-Generation Texture Engine
64-Bit Texture Filtering and Blending
430MHz Graphics Core
256MB/256-bit GDDR3 at 600MHz
NVIDIA® Intellisample™ 4.0 Technology
quality to new levels
NVIDIA® UltraShadow™ II Technology
NVIDIA® SLI™ Technology
NVIDIA® PureVideo™ Technology
Composited Desktop Hardware Engine
Advanced Display Functionality
2048x1536 at 85Hz
0.11-micron manufacturing process
302 million transistors
Peak power consumption 100W-110W
As you can see, for GeForce 7800 GTX, NVIDIA has increased the number of pixel and vertex units, up from 16 pixel pipelines in GeForce 6800 Ultra to 24 in GeForce 7800 GTX, and six vertex units in the 6800 Ultra up to eight in GeForce 7800 GTX. Despite this, NVIDIA is adamant that the GeForce 7800 GTX is more than just higher clock speeds and more pipes, with enhancements made to the pixel and vertex units, as well as a new texture engine designed to accelerate texture processing. We’ll go over the changes NVIDIA has implemented on the following pages.
[A good tech demo] should not be a demo that is left running in the background in a continuous loop like NVIDIA's Dawn. There should be an introduction, middle, and end. People should say 'That was a cool, I want to see that again' as opposed to 'I've seen enough, show me the next demo.' Don't dwell on a particular camera shot to show off the 16-pass shader that took months to code if the scene flows better cinematically with a quick cut -- make the end user want to watch the demo again to get that second glimpse at that effect.
For their second generation shader model 3.0 card, NVIDIA has incorporated a number of improvements into the pixel and vertex shading units. As the use of programmable shading has grown more pervasive, the shaders themselves have grown more complex. Whereas early shader programs consisted of just a handful of instructions, more recently they’ve grown from just a few to 96, and are now up to hundreds of instructions per pixel in the latest games. This is due in large part to the rapid adoption rates of 2.0b and 3.0 hardware. Besides the increase in shader complexity, more complex pixels are generated per pass.
Basically, today’s programs require far more math per pixel than the first generation of programmable titles. With this in mind, NVIDIA emphasized delivering more math per pipeline as well as more math per clock in GeForce 7800 GTX’s design.
Enhancing the pixel and vertex shaders
For GeForce 7800 GTX, NVIDIA analyzed over 1,300 of the most commonly used shaders. NVIDIA then made architectural decisions based on this data. For instance, one trait NVIDIA discovered is that multiply-add (MADD) operations are one of the most commonly used math functions today. These computations are commonly used for lighting (for example, in effects like refraction and reflection, or embossing), normal map calculations (adding depth, height via normal maps rather than geometry to what are actually flat objects), and many other operations.
According to NVIDIA, all these enhancements add up to 50% more individual pipeline efficiency clock-for-clock. When you factor in the additional pipes that have been added, the 7800 GTX delivers a considerably performance improvement over its predecessor.
In order to improve texture performance, NVIDIA has improved texture fetching, allowing the graphics core to grab and access textures faster. This is particularly important for dealing with large textures. Other improvements NVIDIA has incorporated include a 30%+ reduction in cycle time latencies in the fixed-function setup stages, and similarly cut cycles in gamma adjusted rotated grid AA.
As we’ve outlined in previous articles, TSMC’s 0.11-micron process is built for value, not high clock speeds (as 0.13-micron was). This means that performance-enhancing features that are found at 0.13-micron, such as low-k dielectric aren’t present at 0.11. In case you don’t know, low-k dielectric is a material used to insulate the copper circuits within the graphics core. This is important, because TSMC’s 0.13-micron process packs the circuits within the chip more tightly together. As clock speeds increase, these circuits can begin to interfere with one another in the same way crosstalk can occur on telephone lines. This form of electrical crosstalk can hamper performance and waste power.
Low-k dielectric material is used to encapsulate the copper wires from each other, ensuring better performance (and thus, higher clock speeds) and lower power requirements. TSMC reserves low-k for their 0.09-micron and 0.13-micron processes, charging their customers such as NVIDIA more for this feature.
0.11-micron is essentially TSMC’s die shrink of 0.13-micron without low-k, and therefore without the price premium. For GeForce 6600 and RADEON X700, this allowed ATI and NVIDIA to incorporate more pipelines into these value parts affordably.
For GeForce 7800 GTX, NVIDIA is essentially doing the same, using the smaller process to more affordably incorporate more features into GeForce 7800 GTX. In this case, that means more pixel and vertex pipelines; with the chip encompassing a whopping 302 million transistors! In comparison, GeForce 6800 Ultra featured over 220 million, while an Athlon 64 FX CPU contains roughly 106 million.
500MHz or bust?The only real downside to 0.11-micron is, as we mentioned, lower clock speeds. ATI ran into problems getting X700 XT to yield well with sufficient quantities, while the X800 XL is clocked at 400MHz with 16 pipelines.
NVIDIA clocks the GeForce 7800 GTX at 430MHz on the graphics core, this is an improvement of only 30MHz over the stock GeForce 6800 Ultra (we mention the word “stock” because many of NVIDIA’s 6800 Ultra board partners chose to clock the core of their boards at 425MHz). The memory subsystem runs at 600MHz (1.2GHz effective), this is 50MHz higher than the GeForce 6800 Ultra providing up to 38.4GB/sec of peak memory bandwidth (versus 35.2GB/sec in GeForce 6800 Ultra).
At first, these figures may not seem that significant for a “next-generation” product, but when you factor in the efficiency improvements NVIDIA has incorporated into the shading and texture units, the card should come closer to hitting its theoretical specs. That’s the theory at least…
As you can see, GeForce 7800 GTX is the first high-end card NVIDIA has released in quite some time to feature single-slot cooling. This probably comes as welcome news to those of you with cramped cases, or small form factor PCs. Overclockers however, may long for the larger, dual-slot coolers. After all, most enthusiasts leave the slot adjacent to their graphics card empty anyway. Fortunately, we’re glad to report that the GeForce 7800 GTX, like the 6800 GT doesn’t need a dual-slot cooler. But more on this a little later…
The heatsink/fan unit used on the GeForce 7800 GTX resembles the cooler used on the GeForce 6800 GT, only it’s much longer, 6.7” on the GeForce 7800 GTX versus 6” on the GeForce 6800 GT. The heatsink itself is composed entirely of aluminum, with a clear plastic duct placed over the top of the heatsink’s fins.
Air from within your case is channeled from the card’s fan down this duct, and out across the right side of the card. NVIDIA then uses a second black aluminum heatsink to cool the VRM circuitry on the right side of the board, just below the PCI Express power connector. An aluminum plate on the underside of the card draws heat off the board’s memory modules, while an additional plate directly underneath the GPU provides additional cooling as well as holding the heatsink/fan unit on the top of the card in place.
In practice, the cooler operates decently, we witnessed GPU temperatures peak at up to 66 degrees Celsius under load, this pales in comparison to some of the GeForce 6800 Ultra temps we’ve seen (we’ve also seen 6800 GT’s hit over 65 degrees Celsius). Board temperatures are also lower as well.
This is critical for SLI. As anyone who’s run dual GeForce 6800 Ultras in SLI can tell you, the master 6800 Ultra card can get quite hot, especially under load, as the slave card underneath it essentially cuts off its entire air supply, preventing its ducted cooling system from working effectively. While the GeForce 7800 GTX continues to rely on ducted cooling, its single slot design frees up more room between the master and slave boards, supplying slightly more air for the card’s fan, while the chip itself runs cooler and consumes less power.
NVIDIA claims that the GeForce 7800 GTX’s peak power consumption is up to 110 watts, versus 120 watts in GeForce 6800 Ultra. As a result, NVIDIA recommends a power supply with at least 350W, 22amps on the 12V rail, while SLI configurations should suffice with a 500W PSU, with 30amps on the 12V rail. We used a 520W, OCZ ModStream PSU for all of our testing.
Finally, LCD users will be happy to see that the GeForce 7800 GTX features dual DVI connections. Hopefully board partners will also provide HDTV as well as video input on their retail boards.
Both modes are designed to increase the quality of AA, with transparency adaptive multisampling being geared more towards performance.
Transparency adaptive supersampling and multisampling are designed to enhance the image quality of thin-lined objects. A common example we’re using in our screenshots today is chain-linked fences, but other examples include leaves, and to a lesser extent, branches on trees, strands of grass, and other types of foliage. Both of these new AA methods key off the alpha channel to sharpen these types of objects.
The one key difference between the two is that with transparency adaptive multisampling, only one texel sample is used to calculate surrounding subpixel values, this sacrifices a little bit of image quality in order to improve performance. As any of you who have tried the RADEON 8500’s supersampling can attest to, supersampling can have a huge impact on your frame rate.
We ran benchmarks on the GeForce 7800 GTX in both modes on page 8 of this article (all of the GeForce 7800 GTX benchmarks after page 8 all use the card’s traditional AA for testing), and as you’ll see the performance impact isn’t of transparency adaptive supersampling isn’t that significant, in part due to new algorithms NVIDIA has implemented into their AA engine, which only applies supersampling to select parts of the image . We’ll let the screenshots and performance benchmarks speak for themselves though. We should also note that NVIDIA’s ForceWare 75 driver provides gamma correction, which can be toggled on or off. We provided screenshots with and without gamma correction turned on. If you want the best IQ, you should leave it on:
Even without zooming in, you can see the benefits NVIDIA’s new transparency adaptive supersampling mode brings in our Half-Life 2 screenshots. The area immediately surrounding the front of the fence looks much sharper:
GeForce 7800 GTX 4xAA Traditional
GeForce 7800 GTX 4xAA Transparency Adaptive Supersampling
We can’t really say that transparency adaptive multisampling looks that much better though:
GeForce 7800 GTX 4xAA Transparency Adaptive Multisampling
We’re hard-pressed to see any significant differences between NVIDIA’s traditional AA mode and the new transparency adaptive multisampling mode. Let’s see what happens when we zoom in at 300% however:
GeForce 7800 GTX 4xAA Traditional
GeForce 7800 GTX Transparency Adaptive Supersampling
GeForce 7800 GTX Transparency Adaptive Multisampling
Again, while the supersampling mode looks gorgeous, multisampling really doesn’t seem to bring that much more to the IQ table. We still couldn’t see much of a difference with 500% zoom either.
But how does the competition stack up against NVIDIA’s latest form of eye candy? Feast your eyes on this:
RADEON X850 XT PE 4xAA
GeForce 7800 GTX Transparency Adaptive Supersampling
It doesn’t take a 300% zoom to see that the NVIDIA card looks better. Let’s examine the performance impact of NVIDIA’s new transparency adaptive multisampling mode however.
Half-Life 2 – Direct3D
It’s interesting to see that multisampling comes with practically no performance hit, while supersampling isn’t all that bad, only 7% at 1600x1200 with 4xAA and 16xAF. If it were up to us, we’d leave NVIDIA’s new transparency adaptive multisampling mode on, at least in Half-Life 2. We’ll have to run some tests in one of Far Cry’s really thick jungles in the future. We should note that all AA benchmarks from the next page onward utilize NVIDIA’s traditional AA mode for the GeForce 7800 GTX
Pacific Fighters (kamikaze demo)
3DMark 05 – Direct3D
3DMark 05 – Direct3D
Pacific Fighters - OpenGL
Far Cry – Direct3D
Far Cry – Direct3D
IL-2: FB – OpenGL
LOMAC – Direct3D
DOOM 3 – OpenGL
Half-Life 2 – Direct3D
Splinter Cell – Direct3D
Battlefield 2 – Direct3D
The shader-heavy applications in our testing suite really took advantage of the GeForce 7800 GTX. You saw this in the case of Far Cry, particularly once HDR is enabled. A single GeForce 7800 GTX card running Far Cry with HDR was able to outperform two GeForce 6800 Ultra cards running in SLI mode! The other two applications where GeForce 7800 GTX put up a particularly strong showing were Battlefield 2, where the 7800 GTX outperformed the 6800 Ultra SLI configuration in all but one test (2048x1536 with 4xAA/16xAF), while the 7800 GTX was largely able to keep up with the SLI config in Half-Life 2, matching it in performance at 2048x1536.
With more shader heavy games right around the corner, it’s pretty clear which architecture is better built for the long haul, the improvements NVIDIA has implemented into CineFX 4.0 definitely played huge dividends for the 7800 GTX, despite the so-called “modest” boost in clocks. Even more incredible is that NVIDIA is able to deliver all this performance in a single-slot package with lower power draw and better thermals than their previous high-end product, GeForce 6800 Ultra.
Besides the performance and power story, NVIDIA also delivers superior image quality thanks to their new transparency adaptive supersampling mode. By taking additional texel samples and antialiasing passes NVIDIA is able to remove the jaggies often found on thin-lined objects such as chain link fences and foliage. Just take a look at the Half-Life 2 screenshots from page 7 for an example.
GeForce 7800 GTX would best be summed up as an evolutionary product with revolutionary performance, much like the GeForce 4 a few years ago. When you add on the enhanced image quality delivered by transparency AA, the package is even sweeter. With each card selling for $600, we probably wouldn’t recommend dropping the money on a 7800 GTX SLI setup though until faster processors arrive from AMD and Intel. Clearly we were CPU-bound in multiple cases with the SLI config at practically all resolutions. Those of you with high-end LCDs who do decide to go the SLI route, may want to start shopping for a nice CRT capable of 2048x1536.
As it stands now with today’s latest applications, the GeForce 7800 GTX is definitely up for the challenge. Even with 4xAA and 16xAF thrown on for good measure!
|© Copyright 2003 FS Media, Inc.|