[ Print Article! ]

NVIDIA GeForce 8800 GTX/GTS Performance Preview
November 08, 2006

Summary: NVIDIA's next-generation DirectX 10 GPU has finally arrived! The GeForce 8800 GTS boasts 128 shading units running at over 1.3GHz, and a 384-bit memory interface with 768MB of memory. Read all the details about the new architecture and its performance in this article!


IntroductionPage:: ( 1 / 24 )

GeForce 8800: the DirectX 10 era begins

From dictionary.com:
inflection point:
n. A moment of dramatic change, especially in the development of a company, industry, or market.


Now, in 2006, and more importantly in 2007 when the software arrives, PC graphics and gaming is going to take another huge leap forward with the next generation of DirectX 10 hardware. NVIDIA’s GeForce 8800 line is the first GPU that’s been designed to take advantage of this, but NVIDIA’s built in plenty of goodies to make the GeForce 8800 an exciting graphics chip for powering today’s latest DirectX 9 and OpenGL 2.0 titles today.

[image]

<% print_image("01"); %><% print_image("02"); %>

Before we go into the details on this new GPU however, let’s first quickly go over what makes DirectX 10 so special.

[image]
<% print_image("03"); %><% print_image("04"); %><% print_image("05"); %>

As we mentioned in our DirectX 10 Preview article, DirectX 10 has been completely redesigned from the ground up: no piece of the API was left untouched on the graphics side. Because of this, DirectX 10 boasts several new features that are designed to not only improve image quality, but also performance. Here are the key highlights:

  • New driver model: Under DX10 the driver is split into two parts: the user mode driver and the kernel mode driver. The kernel mode driver is kept distinct from the user mode driver to enhance stability.
  • Brand new Geometry Shader added to the middle of the pipeline, in between the vertex and pixel shaders.
  • Increased efficiency, fixing the “small batch problem”. (Microsoft claims performance improvements up to six times that of DierctX 9 hardware running on Windows XP because of this). As a result, less overhead from processor (CPU offloading to the GPU), giving the ability to pump out more objects onto the screen. This increases realism and performance in newer games.
  • Virtualized memory for the GPU. The video card will be able to use space in system RAM to store information that does not fit on local video card memory.
  • Shader Model 4.0 has a broader instruction set including integer and bitwise instruction, transferring more work to the GPU.
  • Fixed function pipeline is gone. Everything is now programmable (done with shaders).
  • Consistency: Capability bits are gone, these were used to tell DirectX what features the GPU did and did not support. With cap bits gone, this leaves hardware manufacturers with fewer ways to deviate from spec. These stricter feature requirements ensure that video cards will all have the same basic requirements; there are only a few optional features such as multisample anti-aliasing. (For example in the early days of DirectX 9, there was lots of variation on floating-point formats (FP16, FP24, FP32) which led to confusion among software developers.)

  • HDR Lighting – Two new floating point HDR formats for DX10-compliant GPUs: Added additional support in DirectX 10 for HDR formats to more compactly represent HDR data, making it possible to use HDR more efficiently

  • Virtualized memory for the GPU – In the past, the amount of texture storage was limited by the amount of onboard memory the graphics processor held. Now textures can be stored on system memory, eliminating the memory bottleneck on texture size.

  • Better geometry instancing – Geometry instancing, first introduced to DX9 in shader model 3.0, has been tweaked in DirectX 10. The enhancements that have been made provide more customization for developers (for example, providing unique animations for objects (like ground units in an RTS game) that are rendered via instancing).

    <% serve_inline_ad_supplemental(); %>

  • Increase in memory texture - increased the maximum texture dimensions in DX10. They were 2048x2048 or 4096x4096 in DirectX 9, and in DX10 they're 8192x8192.

    [image]
    <% print_image("06"); %><% print_image("07"); %>



    SpecificationsPage:: ( 2 / 24 )

    NVIDIA Unified Architecture with GigaThread technology
    Fully unified shading core dynamically allocates shading processing power to deliver incredibly true-to-life 3D characters and environments. NVIDIA’s ground-breaking GigaThread™ technology implemented in GeForce 8 Series GPUs supports thousands of independent, simultaneously executing threads, maximizing GPU utilization.

    Stream processing
    Stream processing is a relatively new computing paradigm that enables parallel processing of a defined series of operations on multiple data streams with extreme levels of efficiency and performance. The shader core of the GeForce 8800 GPUs is comprised of 128 1.35GHz stream processors all working in parallel to deliver unmatched gaming performance. Streaming is the most efficient architecture for graphics. Streaming has evolved with graphics and the GeForce 8 Series is the next generation of a streaming architecture. The GeForce 8800 is a unified architecture where geometry, vertex, and pixel programs share common stream processing resources.

    [image]

    <% print_image("08"); %><% print_image("09"); %>

    Full Microsoft DirectX 10 Shader Model 4.0 Support
    World’s first DirectX 10 GPU with full Shader Model 4.0 support delivers unparalleled levels of graphics realism and film-quality effects

    OpenGL 2.0 Optimizations and support
    NVIDIA Quantum Effects physics processing technology
    Advanced shader processors architected for physics computation enable a new level of physics effects to be simulated and rendered on the GPU -- all while freeing the CPU to run the game engine an AI


    NVIDIA Lumenex Engine
    Delivers stunning image quality and floating point accuracy at ultra-fast frame rates:
  • 16x Anti-aliasing: Lightning fast, high quality anti-aliasing at up to 16x sample rates obliterates jagged edges
  • True 128-bit floating-point high dynamic range lighting: Realistic HDR lighting effects with anti-aliasing provides twice the precision of previous generations - now with support for anti-aliasing

    NVIDIA nView Multi-Display technology
    Advanced technology provides the ultimate in viewing flexibility and control for multiple monitors

    NVIDIA PureVideo HD Technology
    PureVideo HD can deliver 720p, 1080i, and 1080p high definition output and support for both 3:2 and 2:2 pulldown (inverse telecine) of HD interlaced content. PureVideo HD on GeForce 8800 GPUs now provides HD noise reduction and HD edge enhancement.

    NVIDIA ForceWare Unified Driver Architecture
    Two dual-link DVI outputs support two 2560x1600 resolution displays

    [image]
    <% print_image("10"); %><% print_image("11"); %>

    Built for Windows Vista
    NVIDIA SLI Technology
    320-bit and 384-bit memory interfaces with GDDR3 memory
    HDCP capable
    90-nm manufacturing process @TSMC
    681 million transistors

    Notes

    While some were expecting G80 to be built at 80-nm, NVIDIA's new G80 GPU powering the GeForce 8800 is built on TSMC's 90-nm manufacturing process, just like G71 was. The GPU packs in over 680 million transistors, that's over twice the number of transistors in G70 which weighed in at 302 million. Why so many transistors? Among other things, the new GPU packs in 128 shading units, each with their own distinctive L1 and L2 caches, as well as new control logic to handle the GPU's unified shader architecture. You'll also note the new 384-bit memory interface for GeForce 8800 GTX. In their G80 FAQ NVIDIA discusses this:

    Q: Why did NVIDIA choose to implement a 384-bit memory interface on GeForce 8800 GPUs? Why not 512-bit or 256-bit?
    A: With GeForce 8800 GPUs, our goal was to create the highest performing GPU within a given set of design constraints including power, size, cost, design technology, and board-level requirements. Developing the highest performing GPU requires implementing an architecture that is balanced between shader, texture, ROP, and frame buffer width. Through our performance analysis across hundreds of applications and shaders, we determined that a 384-bit memory interface when combined with 128 stream processors and 64 pixels per clock of texture filtering provided the fastest, most balanced processor while meeting all the various design constraints.


    With so many transistors onboard, peak power consumption for G80 is about 177W, in comparison, the G70 inside 7800 GTX topped out at 110W. We'll be going over the changes in the new architecture on the following pages...


    New architecturePage:: ( 3 / 24 )

    According to NVIDIA, work began on the G80 GPU powering the GeForce 8800 GTX and GeForce 8800 GTS in the summer of 2002. G80 is a massively parallel, unified shader design which combines NVIDIA’s stream processors with their so-called GigaThread technology to deliver these new levels of performance. Unlike the GeForce 7900 GTX, this is a completely new architecture designed from scratch for DirectX 10. In fact, NVIDIA boasts that the GeForce 8800 GTX delivers 2X the performance of the GeForce 7900 GTX in current application, and up to 11X the performance in certain shading operations. Let’s go over some of the key features in this new architecture.

    [image]

    <% print_image("12"); %><% print_image("13"); %>

    Unified shader architecture

    In previous graphics architectures, both ATI and NVIDIA incorporated a number of distinct pixel and vertex shading units which were dedicated solely to that particular task only. Pixel shaders worked on pixel operations, while the vertex shading units were dedicated solely to dealing with vertices. There was no mixing or sharing work between them.

    This was done because in previous versions of DirectX, pixel and vertex shaders weren’t created equal. Pixel and vertex units worked on separate instruction sets that were tailored to those specific applications; pixel and vertex shaders for instance supported different instruction limits and constant registers.

    Under DirectX 10 however, all shaders rely on the same instruction set and support the same number of registers and inputs. Each shader is a general purpose programmable floating-point shader, whether its pixel, vertex, or geometry. In other words, no one shading unit is more functional than the other. This allows the shaders to operate on any type of data, whether it’s a pixel program, or a task for the vertex shader. As a result, both performance and efficiency increase.

    [image]

    <% print_image("14"); %>

    Take the example above for instance. Under previous graphics architectures, the vertex shaders in the top scenario are heavily taxed, while the pixel shaders are basically idling. Considering the 3:1 ratio of pixel to vertex shaders in previous NVIDIA GPUs, this equates to a large portion of the GPU’s shading engine essentially being unused!

    <% serve_inline_ad_supplemental(); %>

    In the bottom (water) scenario it’s the exact opposite, the vertex shaders are idling while the pixel shaders are working full tilt.

    [image]
    <% print_image("15"); %><% print_image("16"); %>

    Under a unified architecture, the shading units can work on any task, so if it’s pixel-intensive the shaders can be assigned accordingly or vice versa if it’s vertex-intensive. This can lead to substantial performance and efficiency gains, and just as importantly, it’s invisible to software developers. GeForce 8800’s unified shaders can also be used seamlessly with DirectX 9 and older DirectX versions, as well as OpenGL.

    Before we go further, it’s important to note that unified shader architecture is not a requirement of DirectX 10. Technically, DX10 only requires a unified instruction set. But according to NVIDIA: “GeForce 8800 engineers believed a unified GPU shader architecture made most sense to allow effective DirectX 10 shader program load-balancing, efficient GPU power utilization, and significantly improved GPU architectural efficiency.”



    128 Stream ProcessorsPage:: ( 4 / 24 )

    At the heart of NVIDIA’s new G80 GPU are its new shading units, NVIDIA dubs them “stream processors”. As we just mentioned they’re equally at home operating on vertex, pixel, physics, or geometry shading and controlling it all is the GPU’s dispatch and control logic, which can dynamically assign shading tasks for the most efficiency.

    Stream Processors and GigaThread


    For G80 the keyword is scalar computations. NVIDIA’s engineers found that they were becoming increasingly more common and that scalar computations were difficult to compile and schedule efficiently on a traditional vector-based GPU like G70. Therefore for G80’s shading processors (the so-called “stream” processors), NVIDIA incorporated a scalar architecture. Vector-based shader programs are then converted to scalar operations inside G80 to ensure efficiency.

    Each stream processor can dual-issue a MAD and MUL instruction, and supports IEEE-754 floating-point.


    [image]

    <% print_image("17"); %><% print_image("18"); %>

    Shown above is a block diagram of G80. Those little green squares in the above diagram are the stream processors and as you can see, each group of stream processors has its own dedicated texture address and filtering units as well as L1 cache. The stream processors are arranged into groups of sixteen. For ease of use we’ll refer to each of these group of sixteen as a “bank” of stream processors. With 16 stream processors per bank, and 8 banks total, that adds up to a total of 128 stream processors inside the GeForce 8800 GTX, while two banks are deactivated in GeForce 8800 GTS for a grand total of 96 stream processors in the GeForce 8800 GTS.

    The stream processors run at their own clock speed that’s independent of the rest of the graphics core. In the GeForce 8800 GTX for instance, the stream processors run at 1.35GHz, while the rest of the GPU runs at 575MHz. If you recall, NVIDIA decoupled the clocks on G70/G71 as well, where the vertex shaders ran slightly faster than the rest of the graphics core.

    GigaThread technology refers to G80’s use of threading. In all honesty both ATI and NVIDIA’s GPUs have supported this feature in the past, although it’s been widely regarded that ATI’s R5xx GPUs were capable of handling many more threads per pixel shader quad while also utilizing much finer threading than NVIDIA G70 and NV40.

    NVIDIA doesn’t provide any specifics on how threading has been improved in G80, only to say that “thousands” of threads can be in flight within G80 at any given point. NVIDIA does boast finer thread granularity however, while ATI’s R580 provided a granularity of just 48 pixels, NVIDIA claims a 32 pixel granularity for pixel shader programs.

    <% serve_inline_ad_supplemental(); %>

    The rest of the GPU details

    G80’s texture filtering units are completely decoupled from the stream processors and can deliver up to 64 pixels per clock for raw texture filtering (in comparison to 24 in G70/71), 32 bilinear-filtered pixels per clock, and 32 pixels per clock of 2X anisotropic filtering. GeForce 8800 GTX has six ROPs (the GTS has five), and each can render four pixels yielding 24 ROPs (effective) in 8800 GTX, and 20 ROPs (effective) in 8800 GTS.

    We’ll discuss the memory subsystem in more detail on the next page, as it’s a little different between the 8800 GTX and 8800 GTS, but we can say that G80 continues to utilize a high-speed crossbar design with 64-bit memory controllers (for R520/R580 ATI uses 32-bit controllers, allowing the memory controller to serve more read/write requests simultaneously). The controllers themselves support DDR1, DDR2, DDR3, GDDR3, and GDDR4 memory types.

    Quantum Effects

    Before we discuss the differences between the 8800 GTX and GTS, and the cards themselves, we first wanted to briefly discuss NVIDIA’s “Quantum Effects” technology. This term merely refers to NVIDIA’s use of Havok FX for physics processing on the GPU. If you recall, Havok FX brings the world of physics processing to any shader model 3.0 (or greater) GPU, this includes the GeForce 6 and 7 series, although in those cases with lower performance. As we mentioned earlier, physics processing is handled by the G80 GPU’s stream processors, and not any dedicated physics processing unit.


    The 8800 GTX and GTSPage:: ( 5 / 24 )

    For G80 NVIDIA has currently planned two card SKUs for today’s launch: the GeForce 8800 GTX, and the GeForce 8800 GTS. We’ve gone over some of the key differences between the two GPUs on the previous pages, but this chart sums things up nicely:

    GeForce 8800 Series Specs
    GeForce 8800 GTXGeForce 8800 GTS
    # of Transistors681M681M
    Core Clock (including dispatch, texture units, ROPs)575MHz500MHz
    Shader Clock (Stream Processors)1350MHz1200MHz
    # of Shaders (Stream Processors)12896
    Memory Clock900MHz (1.8GHz effective)800MHz (1.6GHz effective)
    Memory Interface384-bit320-bit
    Memory Bandwidth (GB/s)86.4GB/sec64GB/sec
    # of ROPs2420
    Memory Size768MB640MB
    Price$599 MSRP$449 MSRP


    As you can see, the transistor count between the GeForce 8800 GTX and 8800 GTS is the same, this is because they’re both the exact same G80 GPU. As we outlined earlier, the key difference is that NVIDIA disables two banks of stream processors, that’s 32 shaders total. This cuts the number of functional shading units down from 128 in GeForce 8800 GTX down to 96 in GeForce 8800 GTS. NVIDIA also disables one ROP.

    Clock speeds and the memory subsystem are also slightly different between the two cards, as the GeForce 8800 GTX core clock speed is 575MHz versus 500MHz in the GeForce 8800 GTS, while the shading units on the GTX board run at 1350MHz versus 1200MHz on the 8800 GTS. NVIDIA also uses a narrower 320-bit memory interface on the GeForce 8800 GTS with 640MB of slower 800MHz memory. The GeForce 8800 GTX boasts a 384-bit memory interface outfitted with 768MB of memory running at 900MHz. And of course, you’ll no doubt notice the difference in price.

    The boards themselves are completely different as well:

    [image]

    <% print_image("19"); %><% print_image("20"); %>

    As you can see, NVIDIA’s incorporated a black PCB for the GeForce 8800 reference design (a first for an NVIDIA reference board), with dual-slot cooling on both the GeForce 8800 GTX and the 8800 GTS. The GeForce 8800 GTX is slightly longer than the GeForce 8800 GTS, its PCB is 10.5” long versus 9” for the GeForce 8800 GTS, and as it has been widely reported prior to today’s launch, the GeForce 8800 GTX sports dual PCIe power connectors. We asked NVIDIA why the need for two power connectors and were told that the max TDP requirement for the GeForce 8800 GTX is 177W. According to NVIDIA though, this is a worst case scenario where every functional unit on the GPU is maxed out, and doesn’t happen in typical gaming sessions where their own testing has shown GPU power usage in the 116W-120W range on average and 145W as the high point.

    [image]
    <% print_image("21"); %><% print_image("22"); %>

    Since each external PCIe power connector on the card itself is capable of supplying up to 75W to the card, and the PCIe slot itself also maxes out at 75W, this isn’t quite enough juice for the 177W TDP, hence the need for the second external PCIe power connector. By adding the second connector, this gives NVIDIA plenty of headroom on the 8800 GTX. And by the way, max TDP on the 8800 GTS is 147W, so it’s able to get by with just one PCIe power connector.

    [image]
    <% print_image("23"); %><% print_image("24"); %>

    In terms of power requirements, NVIDIA’s power guidelines call for a minimum of a 450W power supply for the GeForce 8800 GTX (capable of supplying 30A on the 12V rail) and a 400W power supply for the GeForce 8800 GTS (with a current rating of 26A on the 12V rail).



    Board analysis (cont’d)Page:: ( 6 / 24 )

    [image]

    <% print_image("25"); %><% print_image("26"); %>


    Based on this, and the fact that nForce 680i SLI motherboards are shipping with three PCI Express graphics slots (PEG), it looks like NVIDIA may plan on offering triple-card SLI support at some point in the near future. Additional performance for SLI physics is another very real possibility, but this doesn’t explain why the GeForce 8800 GTS doesn’t have a second SLI connector.

    Our guess is that NVIDIA’s probably going to reserve their GX2 “Quad SLI” technology for the lower-power GeForce 8800 GTS, while the more powerful GeForce 8800 GTX will support triple card SLI.

    If you recall NVIDIA’s original Quad SLI cards were closer in spec to the GeForce 7900 GT than the GeForce 7900 GTX due to the lower power/thermal requirements of the 7900 GT. It’s only natural that NVIDIA would employ the same strategy for GeForce 8800, with the added addition perhaps of triple card 8800 GTX SLI for the gamer with a triple PEG slot motherboard, a triple SLI 8800 GTX setup would also likely deliver better performance in some scenarios than a Quad SLI card based around the 8800 GTS’ specs.

    Again though, this is all pure speculation on our part.

    The cooling unit on both the GeForce 8800 GTS and the 8800 GTX is a dual-slot, ducted design that exhausts hot air from the GPU outside your system’s case. The cooling unit itself consists of a large aluminum heatsink with a mixture of both copper and aluminum heat pipes and finally, a copper plate resting directly above the GPU itself. Supplying the entire apparatus with cool air is a large blower-style fan that looks a little intimidating but it’s actually rather quiet in actual operation. The 8800 GTX cooler is rather similar to the 8800 GTS cooler, only it’s a little bit longer basically.

    [image]
    <% print_image("27"); %><% print_image("28"); %>

    Overall the new cooler seems to do a pretty good job of keeping the graphics core cool while also running nearly silently – like the GeForce 7900 GTX and 7800 GTX 512MB, both the GeForce 8800 GTS and 8800 GTX generate very little noise audibly. You’d be hard-pressed to hear the card inside some cases.

    <% serve_inline_ad_supplemental(); %>

    Production

    For both the GeForce 8800 GTX and 8800 GTS NVIDIA has farmed all board production out to a contract manufacturer. This means that whether you’re buying a board from ASUS, EVGA, PNY, XFX, or anyone in between, they’re all manufactured by the same company. NVIDIA isn’t even allowing factory overclocking for the first generation of GeForce 8800 GTX and GTS boards: they’re all shipping at the same clock speeds regardless of manufacturer. Board manufacturers are allowed to integrate their own unique cooling solutions however.

    EVGA for instance sent over one of their e-GeForce 8800 GTX ACS3 Edition cards which feature EVGA’s unique ACS3 cooler. The entire ACS3 card is enshrouded inside one large aluminum duct. At the top of the ducted cooler are the letters E-V-G-A. EVGA also adds an additional heatsink on the underside of the graphics card, just beneath the G80 GPU for additional cooling.

    [image]

    <% print_image("29"); %>

    We haven’t had a chance to see how much of an impact these changes have over the stock NVIDIA cooler, but we’ll definitely be looking into it.

    Besides cooling, the only other way board manufacturers can differentiate themselves from one another for the first generation of GeForce 8800 cards is with their warranty and game/accessories bundle. EVGA for instance bundles their card with a copy of Dark Messiah, while BFG’s GeForce 8800 GTS card comes with a BFG T-shirt and Teflon stick pads for your mouse. Meanwhile, PNY’s 8800 GTX board ships in fairly snazzy packaging.

    [image]
    <% print_image("30"); %>

    It will be interesting to see how this develops, many of NVIDIA’s board partners we spoke with expect NVIDIA’s restrictions to lighten up in time for second-generation GeForce 8800 boards to support factory overclocking. Perhaps by then manufacturing options may be expanded as well.

    [image]
    <% print_image("31"); %>

    Because all the cards are shipping off one production line though, all GeForce 8800 cards will support two dual-link DVI connectors as well as HDCP. Furthermore, we’ve also been told that NVIDIA has no plans to offer different memory sizes for the GeForce 8800 GTX and GTS (for instance, a 256MB GeForce 8800 GTS card, or a 512MB 8800 GTX). For now at least, the standard configuration for the GeForce 8800 GTX will remain at 768MB, while the GeForce 8800 GTS will remain at 640MB. NVIDIA has no plans for an AGP variant of the GeForce 8800 GTX/GTS either.

    8800 Driver

    NVIDIA has incorporated a couple of changes in the GeForce 8800’s driver that we felt we should inform you about. For starters, traditional Coolbits overclocking is gone, instead it’s been replaced by NVIDIA’s nTune utility. This means that you’ll have to download nTune if you wish to overclock your GeForce 8800 graphics card. This is probably a good thing if you happen to have an nForce motherboard, as you can use nTune to do a number of different things besides graphics card overclocking, but probably bad if you don’t own an nForce motherboard, say for instance, the many of you who have upgraded to Core 2 recently and are running it on a 975X or P965 motherboard.

    For those of you who fall into this category, downloading a 30MB+ app just to overclock your graphics card probably isn’t very desirable.

    Another change we noticed in the new driver is that the option to restore the classic NVIDIA control panel has been removed. We’re crossing our fingers that NVIDIA can get that feature incorporated back into their graphics driver, as we know that was a feature that was popular among many of you who don’t like NVIDIA’s new control panel interface.


    Lumenex AA/AF EnginePage:: ( 7 / 24 )

    Besides delivering new levels of 3D performance thanks to its new unified shading architecture with stream processing, NVIDIA’s G80 GPU has also been designed to deliver better image quality than previous NVIDIA GPUs. This is where NVIDIA’s Lumenex technology comes in.

    The Lumenex engine brings with it several new features:

  • 16x coverage sampling anti-aliasing (CSAA)
  • 16x “near perfect” angle-independent anisotropic filtering
  • 16-bit and 32-bit floating point texture filtering
  • Fully orthogonal 128-bit high dynamic range rendering with all the above features (including HDR+AA)
  • A full 10-bit display pipeline

    Better AA

    With the introduction of each new GPU architecture dating back to recent years, NVIDIA has made subtle improvements to their AA engine. NVIDIA’s NV40 GPU for instance was the first GPU from NVIDIA to incorporate a rotated-grid algorithm for anti-aliasing (previous architectures relied on a square 2x2 grid pattern for each pixel), improving NVIDIA’s AA coverage of the horizontal and vertical dimensions for better AA quality. G70 upped the ante even further adding transparency anti-aliasing for alpha textures and also adding gamma corrected anti-aliasing.

    For G80, NVIDIA improves their AA engine even further with the introduction of a new AA mode, coverage sampling anti-aliasing (CSAA).

    While NVIDIA’s 4xmulti-sample anti-aliasing (MSAA) looks nice, it certainly isn’t perfect, jaggies can still be seen on the edges of lines up close. To provide even sharper image quality than 4xMSAA currently provides, even more samples must be taken, this in turn comes with a huge performance hit on the memory subsystem of the graphics card. The performance hit is so great that up until this point, NVIDIA never really considered adding additional AA modes to previous GPUs; they just weren’t capable of delivering high frame rates at higher AA levels.

    To help solve this problem NVIDIA has come up with their new CSAA mode. Unlike brute force MSAA where everything is blended evenly, with CSAA select portions of the scene can be blended with up to 16 samples per pixel for improved image quality (NVIDIA notes that in certain cases, such as the edge of stencil shadow volumes, the new CSAA modes will not be enabled, and those portions of the scene will fall back to NVIDIA’s more traditional 4xMSAA mode). The new CSAA mode uses what NVIDIA terms “coverage samples” in addition to the standard z and color samples to achieve this.

    Basically the idea with CSAA is to achieve 16xMSAA-like image quality, at performance levels closer to 4xMSAA. We ran a couple of quick tests and found the new CSAA mode delivering a performance hit of about 12-20% overall in our traditional testing, but we’ll be looking into this in more detail when we have a little more time to play with G80.

    The Lumenex AA engine also adds conventional 8xMSAA for the first time in an NVIDIA graphics card. For well over five years now the highest MSAA mode had been 4xMSAA.

    <% serve_inline_ad_supplemental(); %>

    Angle-independent anisotropic filtering

    One of the chief criticisms NVIDIA GPUs have been hit with as of late is their lack of support for angle-independent anisotropic filtering. With G80, this problem has finally been resolved resulting in significantly reduced texture shimmering. We’ve recorded new videos in Battlefield that show the improvement in AF quality:

    Editor’s Note: We’re working on getting the AF videos uploaded, some of the files are quite large so we also have to get proper throttling in place on our new video server, please bear with us on this one.

    10-bit display

    Previous NVIDIA GPUs relied on 8-bit DACs. For G80 NVIDIA has integrated a new custom ASIC that supports 10-bit display output. This allows for over a billion different colors to be displayed in comparison to 8-bit’s limitation of 16.7 million. This 10-bit display is a feature ATI has had on their latest Radeon GPUs, so NVIDIA’s basically playing a little catch up here.



    Image quality comparisonPage:: ( 8 / 24 )

    To see how NVIDIA’s new CSAA mode compares to more conventional 4xMSAA, as well as ATI’s Radeon X1950 XTX we loaded up the following scene in Battlefield 2142:

    [image]

    <% print_image("32"); %><% print_image("33"); %><% print_image("34"); %>
    <% print_image("35"); %><% print_image("36"); %><% print_image("37"); %>

    Note that all of the above images are taken with NVIDIA and ATI’s transparency AA/adaptive AA modes turned on, while gama-correction is added to the NVIDIA board to match ATI’s.

    As you can see, our test case has a number of interesting areas that present unique AA challenges to the GPU. For our testing purposes we’re focusing on two areas: the chain-link fence in the bottom right portion of the scene, and the light pole on the left side of the screen. (Keep in mind that this is only a small portion of the scene and that there are several other good areas to look at. Therefore we highly suggest you download the original PNG images and compare the various images on your own offline.)

    First let’s see how NVIDIA’s 4xMSAA mode on G80 compares to ATI’s Radeon X1950 XTX:



    GeForce 8800 GTX 4xMSAA



    GeForce 8800 GTX 8xMSAA




    Radeon X1950 XTX 4xMSAA




    Radeon X1950 XTX 6xMSAA


    Second area:



    GeForce 8800 GTX 4xMSAA




    GeForce 8800 GTX 8xMSAA



    Radeon X1950 XTX 4xMSAA




    Radeon X1950 XTX 6xMSAA


    Notes

    Both NVIDIA and ATI's 4xMSAA modes are pretty comparable to each other in our opinion. You've really got to zoom in on the images (300% or better) to really see the differences. In the past we've shown images that favor both ATI and NVIDIA's GPUs in 4xMSAA. In our testing today with BF2142 it's once again a close battle, but the G80 board does a slightly better job of smoothing the jaggies on the light pole for instance, particularly in the area where the light connects to the pole on both the left and right sides. The top of the pole is just slightly less jagged in our opinion.

    And now, NVIDIA’s 4xMSAA compared to 16xCSAA:



    GeForce 8800 GTX 4xMSAA




    GeForce 8800 GTX w/16xCSAA




    GeForce 8800 GTX w/16xCSAA




    Second area:



    GeForce 8800 GTX 4xMSAA




    GeForce 8800 GTX w/16xCSAA


    Notes

    It's hard to spot the impact of CSAA in our basic screenshots (in games running in motion it's a little easier, but still difficult), you really have to zoom much closer to spot any real differences. Honestly, we prefer NVIDIA's new 8xMSAA mode to the new CSAA mode, at least in terms of visuals. It's much easier to spot the differences in-game between 16xCSAA and 8xMSAA.


    GeForce 8800 GTX w/16xCSAA



    GeForce 8800 GTX 8xMSAA



    Test SystemsPage:: ( 9 / 24 )

    System Setup


    Intel Core 2 Extreme X6800

    EVGA nForce 680i Motherboard
    ASUS P5W DH Deluxe

    2GB Corsair TWIN2X2048-6400C4

    ATI Radeon X1900 XT 512MB
    ATI Radeon X1900 XT 256MB
    ATI Radeon X1800 XT 512MB
    Catalyst 6.8

    EVGA e-GeForce 8800 GTX ACS3
    BFG GeForce 8800 GTS
    Driver version ForceWare 96.45

    NVIDIA GeForce 7900 GTX
    NVIDIA GeForce 7900 GS
    Driver version ForceWare 93.71

    250GB Maxtor Hard Drive Maxline III SATA Hard Drive w/16MB Cache

    Windows XP Professional SP2

    DirectX 9.0c


    Benchmarks

    Half-Life 2 Lost Coast
    Far Cry 1.33 (1.4 patch for ATI cards)
    F.E.A.R. 1.07
    Quake 4 1.2
    Elder Scrolls IV: Oblivion
    Battlefield 2 1.3
    Lock On: Modern Air Combat
    Call of Duty 2 1.3
    Dark Messiah of Might and Magic



    3DMark 06Page:: ( 10 / 24 )

    3DMark 06 – Direct3D








    HDR: HL2 Lost CoastPage:: ( 11 / 24 )

    Half-Life 2: Lost Coast – Direct3D








    Dark Messiah of Might and MagicPage:: ( 12 / 24 )

    Dark Messiah of Might and Magic – Direct3D





    Dark Messiah Performance 1600x1200
    CardMin FPSMax FPS
    GeForce 8800 GTX109157
    GeForce 8800 GTS69110
    GeForce 7950 GX25584
    GeForce 7900 GTX3859
    GeForce 7900 GS1323
    Radeon X1950 XTX4872
    Radeon X1900 XT 512MB4566




    Battlefield 2142Page:: ( 13 / 24 )

    Battlefield 2142 – Direct3D




    Battlefield 2142 Performance 1600x1200
    CardMin FPSMax FPS
    GeForce 8800 GTX88118
    GeForce 8800 GTS5882
    Radeon X1950 XTX5268
    Radeon X1900 XT 512MB4760
    Radeon X1800 XT 512MB3447



    Quake 4Page:: ( 14 / 24 )

    Quake 4 – OpenGL









    LOMACPage:: ( 15 / 24 )

    Lock On: Modern Air Combat – Direct3D







    Pacific FightersPage:: ( 16 / 24 )

    Pacific Fighters – OpenGL








    F.E.A.R. PerformancePage:: ( 17 / 24 )

    F.E.A.R. – Direct3D










    Oblivion Mountains HDRPage:: ( 18 / 24 )

    Oblivion – Direct3D





    Oblivion Performance 1600x1200x32
    CardMin FPSMax FPS
    GeForce 8800 GTX85137
    GeForce 8800 GTS63104
    GeForce 7950 GX24473
    GeForce 7900 GTX3463
    GeForce 7900 GS2140
    Radeon X1950 XTX3762
    Radeon X1900 XT 512MB3657
    Radeon X1800 XT 512MB3252






    Oblivion Performance 1600x1200x32
    CardMin FPSMax FPS
    GeForce 8800 GTX5578
    GeForce 8800 GTS4256
    GeForce 7950 GX23237
    GeForce 7900 GTX2134
    GeForce 7900 GS
    Radeon X1950 XTX2936
    Radeon X1900 XT 512MB2734
    Radeon X1800 XT 512MB2330




    Oblivion HDR+AAPage:: ( 19 / 24 )

    Oblivion – Direct3D













    Call of Duty 2Page:: ( 20 / 24 )

    Call of Duty 2 – Direct3D









    Far Cry HDR+AAPage:: ( 21 / 24 )

    Far Cry – Direct3D








    Company of HeroesPage:: ( 22 / 24 )

    Company of Heroes – Direct3D





    Company of Heroes Performance 1600x1200
    CardMin FPSMax FPS
    GeForce 8800 GTX44.2230
    GeForce 8800 GTS30.5170
    GeForce 7950 GX217.4208
    GeForce 7900 GTX24.6147
    GeForce 7900 GS1188
    Radeon X1950 XTX19.7159
    Radeon X1900 XT 512MB18.7138
    Radeon X1800 XT 512MB11.6103




    OverclockingPage:: ( 23 / 24 )









    Notes

    Once overclocked, the GeForce 8800 GTS is one helluva graphics card. We overclocked our BFG board to 620MHz core/907MHz memory. We actually hit higher speeds but had to lay off for the board to run completely stable.

    The GeForce 8800 GTX board on the other hand didn't scale quite as far, and thus we didn't include those scores on this page. It topped out at 622MHz core/920MHz memory, so 2MHz faster than the BFG card on the core and 13MHz higher on the memory.

    Currently NVIDIA's nTune utility only supports graphics core overclocking (although the shader processors are also OC'ed when you overclock the graphics core), but with future versions of the utility NVIDIA is hoping to add the ability to support independent clocks for the shader and graphics cores. We honestly can't wait to see that feature once it's implemented!

    UPDATE: We've received quite a few emails asking how the overclocked GeForce 8800 GTS is able to match the stock GTX in performance, so we'll provide a few theories here. Most importantly, keep in mind that we're dealing with very early drivers, and we're still barely seeing the potential of the GeForce 8800 GTX. The bottom line is that the 8800 GTX has more performance headroom than the GTS, but perhaps we're not seeing that today due to its early drivers. Another possibility is that the games we're testing aren't demanding enough to really push the G80's new shader architecture, so once the GeForce 8800 GTS is clocked higher it's able to deliver performance comparable, if not slightly better than the stock 8800 GTS.


    ConclusionPage:: ( 24 / 24 )


    It’s interesting that it’s launching just days ahead of Sony’s next-gen PS3 console, which by the way, is powered by NVIDIA’s RSX GPU. All the buzz in gaming right now surrounds Microsoft’s Xbox 360 and Sony’s PS3 game consoles, but the GPUs in both of these consoles ain’t got nothing on them that can’t be found in the GeForce 8800 GTX’s G80 GPU. It’s got a unified shader architecture just like Xbox 360’s Xenos GPU, and a whopping 128 shading units running at an astounding 1.35GHz! Meanwhile, the RSX GPU inside the PS3 is closer in comparison to NVIDIA’s G70/G71 GPUs found in the GeForce 7800/7900 series than GeForce 8800 GTX, and you saw how the 8800 GTX performed in comparison to NVIDIA’s previous graphics architecture in our benchmarks. It just isn’t close.

    And as good as the GeForce 8800 GTX looks today, it’s going to get even better with newer, upcoming drivers. Today’s G80 driver still has some rough edges in it, for instance, we noticed strange anomalies and graphical glitches when running Oblivion with HDR+AA and NVIDIA’s still working out the kinks in Serious Sam 2 with HDR+AA as well. NVIDIA’s aware of these issues and is working hard to correct them, in fact, NVIDIA delivered a driver that they feel has fixed the Oblivion issue just in the past 48 hours, but we haven’t had a chance to test with it yet. Some of the earlier G80 drivers also didn’t scale as well with SLI.

    Because of these issues, if you’re buying an 8800 card today, you’ll want to make sure that you download the latest GeForce 8800 driver off NVIDIA’s website rather than relying on the driver that ships with your card, as it’s bound to be a newer driver that resolves some issues. NVIDIA also ran into a problem with an early batch of 8800 GTX cards that had the wrong resistor value. NVIDIA and their board partners however have been working to pull cards from this batch back to be fixed and sent us this statement: Today NVIDIA announced the hard launch and immediate availability of our new flagship GeForce 8800 GPUs. Some recent reports on the web mention a BOM error (wrong resistor value) on initial GeForce 8800 GTX boards. All boards with this problem were purged from the production pipeline. Product on shelves is fully qualified and certified by NVIDIA and its board partners. We and our board partners stand behind these products and back them with our full warranty.

    In our opinion, all this is par for the course in this industry. Whenever you’re dealing with an untested, next-generation product there are always bound to be issues. Normally they escape the public eye and are only an issue to reviewers and board partners, however this time things were different and word quickly spread like wildfire online. The bottom line is pretty simple: if you’re looking for all-out performance, there’s nothing faster than the GeForce 8800 GTX, it’s without a doubt the new king of the hill in 3D graphics.

    The GeForce 8800 GTS does lag behind GTX pretty considerably in some tests, in fact it’s outpaced by the GeForce 7950 GX2 in some cases and the Radeon X1950 XTX came close in our testing with F.E.A.R. and Call of Duty 2, but it’s too early to say why. Perhaps NVIDIA cut down to many units in its architecture, or perhaps the driver needs a little more fine tuning. Most likely it’s some combination of the two. Considering it’s $450 MSRP is right in line with the current MSRP of the Radeon X1950 XTX, it definitely delivers a little more bang for the buck at current list prices (it remains to be seen what actual street prices will look like).

    With the debut of the GeForce 8800 GTX and 8800 GTS, the GeForce 7950 GX2 and GeForce 7900 GTX will slowly go away, while the GeForce 7950 GT and 7900 GS carry over unchanged at $299 and $199 MSRP. If you’re looking for a hot deal, perhaps you may want to follow the pricing on these cards in the coming weeks. It’s beginning to look like ATI’s answer to GeForce 8800, R600, won’t debut until sometime early next year, so NVIDIA’s going to end 2006 in the same #1 performance position they held for most of last year. We’ll be looking into SLI performance next, as we’re eager to see how two GeForce 8800 cards perform together. Stay tuned…


  • © Copyright 2003 FS Media, Inc.
    [ Print Article! | Close Window ]