|ATI Radeon HD 4870||ATI Radeon HD 5870||Difference|
|Die Size||263 mm2||334 mm2||1.27x|
|# of Transistors||956 million||2.15 billion||2.25x|
|# of Texture Units||40||80||2x|
|# of Shaders||800||1600||2x|
|Board Power||90W idle, 160W load||27W idle, 188W max||0.3x, 1.17x|
If thereís only one key word to take away from the architecture of the new RV870 chip found inside ATIís Radeon 5800 series cards itís ď2XĒ. Thanks to the smaller 40-nm manufacturing process, ATI can afford to double up on pretty much everything that made RV770 so special a year ago without having to charge double the price.
As you just saw on the specs page, RV870 boasts twice the SIMD units as its predecessor, RV770. Each SIMD unit consists of 80 stream processors and one texture unit, so with twice the SIMD cores, youíve got twice the number of stream processors (1600) and twice the number of texture units, 20 (80 effective). Now ATI and NVIDIA use different nomenclature for what they both describe as a ďstream processorĒ -- the actual physical number of stream processing units inside RV870 is actually 320 -- but regardless of the term you use to describe them, itís an impressive amount of compute power nonetheless, as the 5870ís 2.72 TeraFLOPS can attest to.
The overall layout of RV870 is similar to RV770, just bigger. See for yourself in this RV870 block diagram:
The 20 SIMD cores are depicted as the red squares in the center of the diagram. If you look a little closer, you can sit and count the individual stream processing units for yourself. Like RV770 each stream processing unit consists of 4 stream cores+1 special function stream core which are tied to a branch unit and general purpose registers. ATI has tweaked them to improve their IPC.
Tied to each SIMD core is its own dedicated texture unit, again, just like RV770. RV870 boasts improved texel fill rate, up to 68 (bilinear filtered) Gigatexels/sec and improved data fetch rate: up to 272 billion fetches/sec. ATIís also improved the cache bandwidth of the L1 texture caches tied to the texture units. RV870 sports up to 1 TB/sec L1 texture fetch bandwidth, while peak bandwidth between the L1 and L2 caches tops out at up to 435GB/sec.
In comparison, RV770 featured up to 480GB/sec of L1 texture fetch bandwidth and up to 384GB/sec of bandwidth between the L1 and L2 caches.
Up at the top of the block diagram, youíll notice another significant tweak ATI has made with RV870 is the addition of a second rasterizer in the graphics engine of the chip. With a second rasterizer, RV870 feeds more pixels into the engine than its predecessor; this is important when youíre dealing with a GPU thatís outfitted with 1600 stream processors. ATIís also updated their tessellator for DirectX 11 compliance.
256-bit memory interface
Moving to the bottom of the block diagram, youíll also see RV870ís four 64-bit memory controllers, just like RV770. This probably comes as a bit of a disappointment to those of you who were hoping for a wider memory interface and the potential performance boost it could bring under high resolution, high AA scenarios (especially since some of the rumor sites were saying earlier this summer that RV870 would possess a wider memory interface), but in speaking with ATI, they felt that a 256-bit interface with high-speed GDDR5 was the way to go given their die size and transistor budget constraints.
Obviously implementing a wider interface is going to drive those demands up, which wouldíve required ATIís engineers to give up some of RV870ís 20 SIMD cores to compensate and remain on budget. Itís a tradeoff you have to make: integrate more stream processors or go wider with a larger interface? Given the lessons learned with R600ís 512-bit memory interface (where ATI basically couldnít tap into all the bandwidth the larger interface provided and decided to go back to 256-bit for RV670), ATI decided to stick with a 256-bit memory interface and instead integrate more SIMD cores.
The way ATI sees it, GDDR5 data rates (i.e. clock speeds) are constantly improving while GDDR5 prices continue to go down. This is a more cost effective solution to the problem than implementing a larger memory interface.
ATI has made tweaks to their memory interface for RV870 though. To ensure data is transferred without errors, the controller can perform CRC checks on data transfers. ATI says this offers improved reliability at high clock speeds. The L2 cache size has been doubled to 128KB per memory controller. ATI also says GDDR5 memory clock temperature compensation enables speeds approach 5Gbps.
Besides is smaller manufacturing process, which naturally helps to reduce the GPUís power consumption, ATIís also integrated tweaks to further reduce RV870ís power consumption. As listed on the specs on the previous page, the chip consumes as little as 27W at idle. Impressive for a GPU to contains over 2 billion transistors.
A new low power strobe mode has been added to reduce memory power consumption, while ATIís aggressive at reducing clock speeds and voltages at idle. At idle, Radeon 5870 runs at just 400MHz core/1200MHz memory.
For CrossFire users, ATI has also added a new ultra low power state for multi-GPU configurations that comes closer to shutting the secondary GPU(s) down when not in use. The second card throttles down to just 157MHz core/300MHz memory.