[ Print Article! ]

NVIDIA GeForce GTX 580 Performance Preview
November 21, 2010 Darren Polkowski

Summary: Armed with its GF110 GPU and new vapor chamber cooler, the GeForce GTX 580 is supposed to address the key weakness of GTX 480 while delivering more performance for gamers. Does it accomplish its mission? Find out in today's review!


IntroductionPage:: ( 1 / 11 )

During my undergraduate studies I had an economics professor tell us that the best dramas in the world happen between companies. Over the past five quarters we have partaken in one of the best stories the computer graphics business can offer. In September of 2009, AMD released the world’s first DirectX 11 enabled graphics cards to the world and not to be out-shined, Nvidia unveiled its Fermi architecture to the public in November.




However, Taiwan Semiconductor Manufacturing Company (TSMC) disclosed that it was having problems with production, citing trouble with “chamber matching” and ion implanting supplies. Variability in any production process can be minimized but the choice of production arrangements can create new problems. As an example, production bottlenecks can be offset if wafers can be separated into smaller groups as to spread work to multiple stations and then merged together after that step. This is fine but high variances between the stations can be catastrophic. At 40nm we are talking about 5,000 transistors fitting on the end of a human hair. Therefore large variations in chemical solution recipes or the failure to remove material at an ongoing uniform scale will result in large disparities across wafers. In the TSMC case, their troubles plagued AMD, Nvidia and consumers. Without a solid flow of wafers, the supply for AMD Radeon 5000 series graphics cards became tight. Another force compounded the effect of short supply, namely a lack of competition at the high end from Nvidia. Together the price for Radeon based cards raised to levels above the original launch price targets. In January TSMC announced that they resolved its issues with production congruity and two months later Nvidia launched its flagship consumer graphics product GTX 480.

Fast forward to last month when AMD released its Radeon 6000 series graphics boards. The company redesigned the silicon in an effort to maximize performance, balance the functional unit mix, cut power consumption, reduce cost per die and improve esthetics like acoustics. (And to one-up Nvidia). However, with any good story there needs to be a happy ending. That brings us to the present. Like AMD, there were a lot of changes that needed to be made to GF100. Instead of the 512 ALUs (“CUDA Cores”), 16 polymorph engines (which include tessellation), power efficiency, and what we were told Fermi would deliver, we received GeForce GTX 480.

Now while that may sound like someone gave us a second rate birthday present, it in some ways was. GTX 480 was the most powerful Nvidia product it has ever produced, but it was not the same as what we got all jazzed up about back in November. This is where we are today… Fermi 2.0 inside the GeForce GTX 580, and on the pages that follow you will understand what the late Paul Harvey would call, “the rest of the story.”





Broken ToysPage:: ( 2 / 11 )

Below you can see the original block diagram that Nvidia supplied to show some of the functional units inside of GF100. However, you will see something missing. What is missing in that hole is exactly what didn’t show up inside GeForce GTX 480. We received 480 (93%) of the possible 512 arithmetic logic units (ALU) or “CUDA cores.” This equals one full streaming multiprocessor group comprised of one SIMD, 32 ALUs and a tessellator.

[image]

<% print_image("01"); %>

Each SM can generate about 0.25 triangles per Polymorph Engine or 1 per Graphics Processing Cluster (GPC). This in theory appears to be a balanced approach as a single Raster Unit in each GPC can render 1 triangle clock. Not having the 16th unit meant that the GPU could create 3.75 triangles per clock cycle versus 4 rasterized per clock. This imbalance creates a slight bottleneck between creating and rasterizing triangles. Both are important for triangle subdivision and using textures for displacement mapping. While 0.25 triangles per clock may not seem like a lot, but theoretically it equates to 193,000 triangles per second of diminished geometry throughput and less performance from tessellation and vertex texture fetching. I say theoretically because in the real world, not all triangles are equal. Under certain usage patterns it is closer to 2 billion triangles per second versus the 3 billion that a full 16 SM graphics processor could supposedly output.

[image]
<% print_image("02"); %><% print_image("03"); %>

GF100 debuted with slower than expected core clock and memory frequencies, increased power consumption, and additional heat which required a more powerful and loud cooling solution. Despite its limitations, Nvidia’s GTX 480 is certainly a monster of a chip and can handle almost anything currently available on the market to render. That being said, who cares about GF100, we now have GF110. Nvidia took what it learned from launching GTX 480 and designed a piece of silicon that took the best of GF100 and some of the improvements from GF104 to deliver what we now know as GeForce GTX 580.

<% @serve_inline_ad( 0 ); %>





The present I wanted for so long!Page:: ( 3 / 11 )

This is the Fermi I was told about only better. FULL EVERYTHING! All things being equal, the architectural changes and fixes could show performance as much as 5% in Unigine Heaven 2.1 and Metro 2033. Nvidia is even claiming as much as 15% in Dirt2.



As you can see from the tables, there are very nice improvements all over the place from GeForce GTX 480 to GTX 580. We are clearly expecting GeForce GTX 580 to crush geometry and be a bit soft on shading. A big surprise… right? When has that really been different when looking at AMD and Nvidia?



Additionally, Fermi is scalar heavy compared to the AMD GPUs. Beyond3D did some compute tests and showed that GF100 can issue twice as many scalar instructions than AMD and AMD can issue twice as many Vect4 instructions. [Alex Voixu, et al. Beyond3D.com] Again, not too surprising as old habits die hard and we like to do what we have always been good at. We expect GF110 to be an even bigger brute to geometry. (and it is… we peeked at the test scores)

[image]

<% print_image("04"); %>

Nvidia gave GF100 support for more tile formats to enhance depth buffering to improve z-culling. The basic premise here is that the z-buffer is a table. If you look at this table like a texture, it can be stored, compressed, uncompressed, given different levels of detail and so on. You therefore can access larger data sets for depths and use what best speeds up your application. This is something that we could probably write an entire article on, but the key here is that streamlining the process of getting pixels to your screen is what is most important (and that they look correct). Therefore, removing pixels that cannot be seen because they are on geometry that is obscured by other geometry should be removed from the work schedule as soon as possible. Just like people, the less time you have to think about or do fruitless work, the better and more efficient you are.




GF100 incorporated fully compliant IEEE 754-2008 single and double precision. Each of these ALUs use fused multiply-add instructions. Looking at the diagram below, you can see that using a FMA over a MAD (Multiply-Add) is better at retaining higher precision. It also makes it possible to do two floating point operations per clock cycle. This is huge when AMD and Nvidia are making talking points about GLOP throughput calculations.

[image]
<% print_image("05"); %><% print_image("06"); %>

GF110 got something juicy from its little brother. GeForce GTX 460 (GF104) introduced “full speed” 64-bit floating-point (FP16) texture filtering. DX9.0c introduced a minimum 32-bit floating-point lighting precision. When hardware started supporting FP16 blending, high dynamic range rendering (HDRR) really came alive. That being said, GF100 was designed to handle one texture address and four samples per texture unit. With 64 texture units it can still only deliver one location but thanks to GF104 it can now return and filter four INT8 (32-bit), four FP16 (64-bit) or one FP32 (128-bit) texture samples. Not only should this help GF110 to process HDR but also when using displacement mapping and texture heavy applications.




GTX 580: On the outsidePage:: ( 4 / 11 )

Nvidia made a lot of changes to how GeForce GTX 580 handles heat, voltage levels and noise output. The printed circuit board (PCB) is exactly the same length and width. GTX 580 and 480 are shorter than Radeon 5870 but longer than Radeon 6870, 6850, and 5850. The first change you can see is to the cooling system. In the image below there is a lack of plumbing extending out the side of the shroud compared to GeForce GTX 480. Another change is the shape of the cover. The top has been beveled to allow better access of air to the intake. While in a single card configuration this should not impact cooling performance. However, for SLI and Triple-SLI in tight cases this should help improve airflow.

[image]

<% print_image("07"); %><% print_image("08"); %><% print_image("09"); %>

The next change that you can visibly see is the new cooling system. The lack of plumbing extending out the side of the shroud compared to GeForce GTX 480 is clearly visible in the image below. Additionally, Nvidia made the intake opening 10mm wider in diameter. GeForce GTX 580 is 65mm wide while GTX 480 is 55mm.

[image]
<% print_image("10"); %><% print_image("11"); %><% print_image("12"); %>

Flipping the cards over you will see that Nvidia removed the hole in the PCB, GTX 480 had this space to gain additional air which could be drawn into the fan. In its place on GTX 580 is some new voltage monitoring circuitry. In the image below you can see it located on the top side of GTX 580’s PCB. There are three separate units. Each one monitor a different 12V power connection (8-pin, 6-pin and PCI Express connectors). Nvidia states that this is to adjust performance when certain applications that stress the card’s power draw beyond the shipping specifications. While over-draw protection is great for the general consumer, we are not so sure it will be received the same by extreme overclockers.

[image]
<% print_image("13"); %><% print_image("14"); %>


<% @serve_inline_ad( 0 ); %>



GTX 580: Under the shroudPage:: ( 5 / 11 )

On the previous page we showed you that GeForce GTX 580 did not have copper heat pipes. Nvidia is using a vapor chamber to transport heat away from the processor. This isn’t a new technology. In fact, Sapphire used a vapor chamber on its HD 4890 Toxic cards to improve cooling efficiency for its factory overclock.

Heat pipes and vapor chambers are similar in functional design. Both utilize the natural process of phase changing and the transport mechanism of a convection cell. A copper container is filled with a special liquid and then gets sealed inside. This chemical compound is special because it must have a boiling point close to but not too far above room temperature. It also needs to be able to take and give away energy freely when it changes between being a liquid or a gas.

[image]

<% print_image("15"); %><% print_image("16"); %><% print_image("17"); %>

As a heat source is applied, the liquid evaporates once it reaches its boiling point. This gas then expands and moves away from the heat source. The gas will eventually reach a surface that is cooler than the boiling point and condense back into a liquid. Rinse and repeat to create a nice convection cell. Heat from the processor naturally moves toward the surface with cooling fins and back again to be heated.

You may be thinking, “The cards will not be situated with the GPU facing up when I put them into my case. Will this cooler work correctly?” Yes it will. Orientation of the cards is moot as the heating and cooling creates transport mechanism for the liquid/gas. Heat pipes work on the same premise and there are many aftermarket coolers which have pipes twisted in crazy directions yet still work properly.




[image]
<% print_image("18"); %><% print_image("19"); %>

Once you remove the cooler for the graphics processor you can see the large metal plate that acts as a heatsink for the memory. There were some small modifications to the fan and its controller. The radial fan sits on a foam cushion to help reduce vibrations and the second is a ring added around the fins. The ring acts as a retainer to keep the fins from oscillating. Nvidia also added a card specific profile to keep sound levels below an undisclosed threshold.

[image]
<% print_image("20"); %><% print_image("21"); %><% print_image("22"); %>

Once we remove the rest of the cooling apparatus we can see the bare PCB. The layout is almost identical to the GTX 480 except for the changes we already talked about. The card use Samsung K4G10325FE-HC04 5.0Gbps rated memory modules. Now to the main event! Let’s let the cards duke it out for the belt!



Testing SetupPage:: ( 6 / 11 )

Our test bench for this article was comprised of a Maingear Shift. We wanted to be able to show that systems from GeForce GTX 580 system launch partners are available today. We will be spending more time in another article dedicated to checking out the Maingear system in more detail.

[image]

<% print_image("23"); %><% print_image("24"); %><% print_image("25"); %>

The test system has been overclocked to 4.33GHz and should certainly eliminate any potential for the benchmarks to be CPU bound. You can find more specifics on the system via the CPU-Z. Additionally, we have supplied the GPU-Z screenshots so you can see the card specific details.

For each test we ran at least three runs. The average reported per resolution and configuration is the geometric mean of all of the runs as to get a true center of the data. The minimum is the minimum of all of the runs. We would like to demonstrate user experiences as much as possible. While it does not play well for PR and marketing types, it is what we experienced and what a gamer would experience under the same conditions. Additionally, the three resolutions we selected are the two most popular and the maximum. For the MS DirectX SDK we used their default and 1920x1080 for Unigine Heaven 2.1. If you have comments about the test setup or what you would like to see run through the paces, please contact us.



Nvidia Reference GeForce GTX 580


[image]

<% print_image("26"); %><% print_image("27"); %>

EVGA GeForce GTX 580


[image]

<% print_image("28"); %><% print_image("29"); %>

Asus EAH 6780


[image]

<% print_image("30"); %><% print_image("31"); %>

Asus EAH6850 DirectCU


[image]

<% print_image("32"); %><% print_image("33"); %>


<% @serve_inline_ad( 0 ); %>

AMD Reference Radeon HD 5870


[image]

<% print_image("34"); %><% print_image("35"); %>

AMD Reference Radeon HD 5850


[image]

<% print_image("36"); %><% print_image("37"); %>



Microsoft SDK Test Samples & Unigine Heaven 2.1Page:: ( 7 / 11 )

As mentioned on the previous page, we decided to take two samples from the DirectX 11 software developers’ kit to see raw output. We chose the Detail Tessellation 11 and SubD11 samples because they do what their names suggest, tessellation and triangle subdivision. Earlier we made the notion that GF110 would be a brute when it came to geometry. Well, it is. In a screenshot you can see the wireframe and all of the pretty geometry that underlies this sample. It is classic example that takes a simple shape, uses a displacement map for height and then tessellates it into A LOT of triangles. As you can see for yourself, GeForce GTX 580 just chews through the geometry and says “Thank you! May I please have another?”

Detail Tessellation 11 SDK Sample


[image]

<% print_image("38"); %>

SubD11 SDK Sample


The next test is one that I will call overkill. While the concept is simple, adding more geometry to the scene makes the surfaces smoother and more robust (and because cool kids use more triangles). The model in the test is rather basic so subdividing the geometry becomes pointless. The test has merit though. Once again it shows that GF110 can devour tough geometry workloads and still spit over 30 frames a second. This should translate to games with less vain geometry demands.

[image]

<% print_image("39"); %><% print_image("40"); %>

This leads me to a point of contention for some of you. “How much tessellation is too much?” Obviously in this example there is a point when simulating a super smooth surface on a cartoonish character really will not bring me into a place of realism. If there are people who play World of Warcraft and are completely immersed, it is certainly not due to the graphics realism. But before you pick up rocks and yell “Blasphemy!” Let me just state for the record, I personally love the concept of breaking down geometry into as many smaller bits as possible for the more realistic surfaces. I have been saying so for almost 4 years. Tessellation with displacement mapping can save large amounts of memory address space and bandwidth. It is amazing. There is a point of diminishing returns and I believe the level of detail should be in the hands of the consumer.





<% @serve_inline_ad( 0 ); %>

Unigine Heaven 2.1


Here is a pretty benchmark. It used tessellation on objects that definitely look better with it enabled. I would have liked to see even more used. Specifically on the wooden boards so the grain stands out and even to add smaller rocks and bits in between the cobbles on the walkway.

[image]

<% print_image("41"); %>

GTX 580 GeForces its will over the geometry. As with the other tests, GF110 crushes the geometry as set up by the Heaven 2.1 benchmark.




DX11 FPS: Battlefield 2: Bad Company & Metro 2033Page:: ( 8 / 11 )

Battlefield 2: Bad Company


[image]

<% print_image("42"); %><% print_image("43"); %>


Battlefield 2: Bad Company was very playable on this new high end graphics processor. All of the test subjects performed admirably. However, regardless of what the numbers say, the game felt smoother with the two Nvidia cards. This is why we actually play games with the cards. Call it qualitative testing. (Yes honey, I have to “test” this game again.)






<% @serve_inline_ad( 0 ); %>

Metro 2033


[image]

<% print_image("44"); %><% print_image("45"); %>

Metro is an interesting bird. It has some issues with 2560x1600. On both the Nvidia and AMD cards, the benchmark would not load textures and even light the environment correctly. It did not always happen but it did happen to all of the cards. We had to run more than three tests several times just to get everything to look correct before we actually started recording scores. Next time around I will be doing a walkthrough with Fraps instead of the benchmark with Fraps. It will give a better representation of “real” game play and to hopefully bypass a repeat of the shenanigans we experienced.







DX11 RTS: Civilization V & DX11 Driving Sim: Dirt2Page:: ( 9 / 11 )

Civilization V


[image]

<% print_image("46"); %><% print_image("47"); %>

Civilization V has some great surface tessellation. It looks immensely better than without it enabled. Here is another place where too much tessellation could become a bad thing but in its current state it looks great. This and other real time strategy games are about units. (…”guns, lots of guns” – Neo) Sometimes scrolling around the map can cause nasty spikes in performance. Minimum frames are something you cringe at with an FPS game because it can mean virtual life or death. RTS games like Civilization V are all about micromanagement to the n-th degree and more units and moving all over the map can cause delays, but not because the graphics card can’t render it. It can be due to waiting for the CPU to calculate something, something to load, an event started, and so on. That being said, the new GeForce did extremely well. It ousted the AMD cards in every showing.








Colin McRae: Dirt2





[image]

<% print_image("48"); %><% print_image("49"); %>

I love the eye candy that Codemasters enables on the car models. It is an older game in terms of lifespan but it is yet another genre to test. Racing and flying simulations are a lot more about making things look real (however, Dirt could have better looking people).





Other Tests: Sound, Heat and PowerPage:: ( 10 / 11 )

Noise Levels


The AMD Radeon HD 5870, or more affectionately dubbed “The Batmobile” [Nate et al. at Legit Reviews] took the prize for loudest card overall (I still love that picture guys). “Holy Idle!” What I mean to say is look at idle figures for the GTX 580. It is in line with the other cards in the suite. Our system noise was hard to isolate but it was clear that the HD 5000 series were still loud, especially the 5850 at spin-up. That being said, the GTX 580 is quieter than the EVGA GTX 480. We would also like to point out that the 480 is of a later silicon generation. This means that the card so it has less leakage than the reference launch cards.



Power Utilization


With a lot of extra units running, GeForce GTX 580 could draw more power under load than GTX 480. But as you can see from the results, it consumed less than the GTX 480. It did however draw more its peers from AMD. Our power meter was topping out around 450 watts. GeForce GTX 580 is also more power conscience when the system was idling.

Now you might be asking… “No Card?” There actually was a card in the test system. We used a Diamond BizView BV200 card to help with the power test. It is a fanless PCI card based on RV280 that allowed us to get into the operating system and back out. We left the card inserted for all of the power, temperature and acoustic tests for consistency.



Temperatures


I have to give Kudos to Nvidia for controlling the temperatures on GTX 580. After several tests it came back the same, 10 degrees cooler than GTX 480 under stress and 20 degrees less while at rest. Part of this change is due to Nvidia controlling leakage in the chip and some comes from a more efficient cooling system. I will say that 150 degrees is still hot, but it is much better than being able to hooking a pair and boiling water for tea.








ConclusionsPage:: ( 11 / 11 )

You should be able to draw your own conclusion about GeForce GTX 580. On one hand it is what we were expecting when GeForce GTX 480 launched. It is still a HUGE chip and is very warm. On the other it is the fastest card on the market. I am impressed with how it plays games in this test suite and beyond. Yes there are alternative cards for the price but that is NOT what this type of product is for.

Nvidia claimed it was a great product and it stands up to a lot of their claims. It is quieter, more efficient, cooler, and can output everything we thought we were getting in GF100 and more. They are available at e-tailers right now and are available through system integrators such as Maingear. If you have $500 per card and are looking for the fastest ride on the street, this is your baby.

Notes:





About the author:

Darren now serves as the Editor-in-Chief at FiringSquad.Com. He has been working with online publishing, branding, technical marketing, content and social media since 2003. Prior to joining FiringSquad he held roles as the Public Relations Manager for Palit and the Graphics Editor for Tom’s Hardware Guide. He holds a bachelors degree in economics and a master’s degree in business administration. Darren enjoys long walks on the beach, snuggling by the fireplace with a technology white paper, getting 25 point pounces in L4D2, melting faces, and other general mayhem.





© Copyright 2003 FS Media, Inc.
[ Print Article! | Close Window ]