||Hands On With AMD's Opteron 242
April 22, 2003 Chris Angelini
Summary: Recently we had the chance to sit down with three Opteron 242 servers. Each system had a unique OS (Windows 2000, 64-bit Windows, and 64-bit SuSE Linux) and motherboard. In this article we go over the highlights of the Opteron architecture and our experience so far. Is AMD's 64-bit platform as promising as it looks on paper? Find out in this article!
| Introduction||Page:: ( 1 / 6 )|
It seems like we’ve been hearing about AMD’s 64-bit computing initiative for years now. And indeed, we have. I first remember hearing about the K8 architecture in 1999, though at that time it was all speculation surrounding the performance implications of x86-64 and a multi-core processor die. AMD has come a long way since then, proving itself worthy to go head-to-head with Intel’s highly scalable Pentium 4. The K7 has, and will continue to serve AMD, but the time has come for a successor.
The much-anticipated K8 is here, and we recently had the opportunity to spend some time working with the processor. But before you reach for the nearest credit card, understand that the products we dealt with are server chips aimed at dual processor machines, none of which are to be equipped with AGP slots. At least for the time being, the 64-bit processor we’ve come to know as Opteron is all about the server scene.
But don’t worry. In a couple of month’s time, we’ll begin seeing single-processor versions (as well as Opteron chips designed for higher-end eight-way servers), which will be suited for workstation use alongside NVIDIA’s nForce3 Professional chipset. We’ll certainly revisit Opteron when that happens, but for now, let’s focus on the processor that AMD hopes will help resuscitate its bottom line.
Opteron’s architecture has been public domain since late in 2001, yet it is still entirely relevant since the eighth-generation processor architecture does deviate somewhat from the Athlon XP (K7) design we’re used to. Of course, the most touted addition is support for the x86-64 technology that allows a processor to utilize 64-bit addressing and 32-bit operands with a compatible operating system and similarly compiled applications. Tim Sweeny, one of the most influential game developers, says, “We expect to support 32-bit and 64-bit clients and servers for game play, but might require 64-bit for content creation, because of the significant requirements of our new content development tools,” exemplifying the importance of 64-bit virtual addressing for those who can take advantage of the memory and addressability benefits.
So for now, powerful servers and workstations stand to benefit most from the transition to 64-bit computing, while the mainstream advantages are few. Will we ever see a time when average home users will enjoy the fruits of a 64-bit setup? Tim seems to think so.
“Because of our new content tools, we're already feeling a very strong need for 64-bit internally right now, and by year's end I expect we'll look at 64-bit as something that we couldn't possibly do our jobs without. We expect this sentiment to carry over to other game developers in the next 12 months, to high-end consumers over the next 24 months, and the wide mainstream all the way down to the lowest end of the market within 36 months.”
Tim goes on to say, “In our next-generation technology, we are building 2,000,000-polygon meshes, and running them through a preprocessing program that analyzes the geometry and self-shadowing potential of the mesh based on thousands of incident lighting direction using per-pixel floating point math, and compresses all of this data down to texture maps, bump maps, and 16-component spherical harmonic maps at as high a resolution as possible.
This process uses many gigabytes of memory, and implementing it on 32-bit CPU's places a lot of constraints on the size of meshes we can preprocess and the resolution of maps we can generate. With onerous programmer gymnastics, this kind of algorithm could be made disk-based or Address Windowing Extensions aware, but these approaches require an order of magnitude more development effort, and aren't practical given the frequency with which we change and improve our algorithms.”
SIDEBAR: The official PR
| Eighth-Generation Micro Architecture||Page:: ( 2 / 6 )|
There are many similarities between the Athlon XP we know today and the Opteron that is being introduced. At the same time, the passing of time has necessitated certain modifications to ensure that the Opteron, and later this year, the Athlon 64, are able to remain competitive. The first design consideration is a technique Intel employed to enhance the scalability of the Pentium 4. Mainly, AMD has added two stages to its operation pipeline, resulting in a 12-stage integer and 17-stage floating-point pipeline.
As we learned with the Pentium 4, a longer pipeline is a boon when it comes to increasing operating frequency, but it penalizes the number of instructions a processor can successfully execute in a clock cycle (IPC). AMD is fully aware of these ramifications and, like Intel, has taken measures to compensate. In fact, AMD claims it will be able to enhance IPC beyond what we’re currently seeing with the Athlon XP family.
|<% print_image("01"); %>||<% print_image("02"); %>|
Integrated DDR Memory Controller
Another step AMD has taken to further improve operating frequency is retaining its .13-micron manufacturing process and adding Silicon on Insulator technology, allowing AMD to reduce transistor capacitance by roughly 25 percent, all the while adding a significant number of new transistors.
On the flip side, AMD is looking to increase IPC by moving the platform’s memory controller away from its traditional residence, the North Bridge or Memory Controller Hub, depending on whose architecture you follow, and onto the processor die itself. It’s no secret that memory bandwidth has become a pivotal statistic in referring to the capabilities of a graphics card, and it has also become increasingly important as processors have matured.
The Opteron’s memory controller is of the dual-channel DDR variety, resulting in a 128-bit interface with support for DDR200, DDR266, and DDR333 memory. In a single-processor system, DDR333 is able to provide up to 5.3GB per second of bandwidth. However, in a dual-processor machine like the ones we had the opportunity to test, the platform’s effective bandwidth it doubled to 10.6GB. The unfortunate consequence is that in a dual-channel system, four memory slots need to be populated in order to realize its full bandwidth potential. According to AMD, as the processor’s operating frequency scale upward, the latencies incurred by memory accesses continue to drop as a result of the on-die controller.
SIDEBAR: If you’d like to check out AMD’s Opteron launch event, you can catch the show on AMD’s website.
| Controller (cont’d)/HyperTransport||Page:: ( 3 / 6 )|
The most obvious disadvantage associated with an integrated memory controller is the risk of falling behind with regard to memory technology. Look at VIA’s product history, for example. VIA has traditionally made multiple revisions to its memory controllers in the name of milking extra performance from the Athlon XP and as a result we’ve seen KT133, KT133A, KT266, KT266A, etc.
AMD won’t have that luxury with the K8 family. Opteron currently supports dual-channel DDR333, while Intel is utilizing DDR400 on the recently introduced 875P chipset. So, if AMD hopes to remain competitive as Opteron evolves and Athlon 64 emerges, it will undoubtedly require more than one revised memory controller. After all, DDR-II isn’t that far over the horizon.
We were unfortunately unable to verify AMD’s claims of 10.6GB per second, as SiSoft Sandra 2003 was unable to properly identify the onboard memory controller and repeatedly returned a memory bandwidth figure around 500MB per second.
Formerly referred to as Lighting Data Transport, HyperTransport is AMD’s data transfer bus that facilitates interaction between the processor (or processors in a multi-processing environment), PCI-X Tunnels, I/O Hubs and AGP 8x Tunnels. The HT links within the Opteron are 16-bits wide for a resulting 3.2GB per second of unidirectional bandwidth. And, the Opteron has three such links whereas the Athlon 64 hosts a single link with 6.4GB per second of bidirectional bandwidth.
One of the main benefits of HyperTransport is its scalable nature. Running natively at 200MHz, HT can be pushed to 800MHz DDR, for n effective 1.6GHz. Furthermore, HyperTransport paths can be 2, 4, 8, 16, or 32-bits wide, offering anywhere from 200MB/s to 12.8GB/s of throughput. The effect AMD is looking for is an alleviation of front side bus, memory, chip-to-chip and I/O expansion bottlenecks.
The initial round of motherboards supporting the Opteron will be based on AMD’s own 8000-series chipset. As we’ve heard before, AMD is not a chipset manufacturer, so we’ll see what role it plays once the likes of SiS, AMD, and NVIDIA come online with their own products. Each component of AMD’s chipset connects to the HyperTransport bus through links with different throughputs. For instance, the 8111 I/O Hub features an 8-bit link with 800MB per second of aggregate bandwidth. The 8131 I/O Bus Tunnel is a higher-performance device with PCI-X support, necessitating more bandwidth. It connects via a 16-bit interface boasting 6.4GB per second. Similarly, the 8151 Graphics Tunnel utilizes one 16-bit connection.
AMD is quick to point out that, compared to Intel, it has successfully implemented 64-bit support without crippling the performance of 32-bit applications. The Opteron is backwards compatible with 32-bit code and the architectural extensions that have emerged over the past few years including MMX, 3DNow!, Enhanced 3DNow!, SSE and SSE2. Like the Athlon XP before it, Opteron sports 64KB of L1 data cache and 64KB of L1 instruction cache, in addition to a 1MB L2 cache that is 16-way set-associative. Finally, AMD has reportedly reduced the latencies of its Translational Lookaside Buffers (TLBs) and enhanced its branch predictor to help offset the ICP penalties potentially incurred by adding extra stages to the operation pipeline.
SIDEBAR: AMD’s codename for Opteron was SledgeHammer
| Opteron packaging||Page:: ( 4 / 6 )|
As previously mentioned, AMD is manufacturing the Opteron on a .13-micron process using SOI technology from Fab 30 in Dresden. Because it utilizes three HyperTransport links, the processor requires no less than a 940-pin socket interface utilizing ceramic micro PGA packaging.
Athlon 64, which doesn’t necessarily need such a robust platform on which to operate, can get away with a single HyperTransport link. Further, it can be speculated that by the time Athlon 64 surfaces, AMD will have revised the processor’s memory controller to add DDR400 memory support. If that turns out to be the case, there is a strong possibility that the processor will only utilize a single channel of DDR memory to correspond with the single HyperTransport link, already limited to 3.2GB per second of throughput. Additionally, by stripping two HyperTransport channels, AMD reduced the number of pins needed by Athlon 64 to a more conservative 754.
Considering the size of the 940-pin Opteron, the processor itself runs surprisingly cool with a large copper heat sink over it. It should also be noted that the server boards all seem to require an ATX power supply with an extra 8-pin auxiliary power connector, much like the first 760MP boards that emerged when Athlon XP was introduced. Initial specifications of ASUS’ nForce3-powered SK8N indicate the workstation board will run trouble-free with a standard ATX power supply.
Hands-On with Opteron 242
AMD’s Opteron naming scheme is easy enough to understand, but is decidedly ambiguous in the way it references performance. The first number in the chip’s model rating corresponds to the system the chip is designed to work with, be it in a one-way system, a two-way, or an eight-way environment. Meanwhile, the following two numbers somehow relate to the processor’s performance, virtually eliminating clock speed as a publicized feature. For instance, AMD is immediately announcing three Opteron processors, the 240, 242 and 244. The 240 runs at 1.4GHz, the 242 at 1.6GHz and the 244 at 1.8GHz. Of course, all three are intended to be used to 2P machines. Presumably, subsequent releases will be accompanied by incrementally higher model ratings.
SIDEBAR: 1P and 8P Opterons will launch later this quarter.
| Test conditions/Motherboards||Page:: ( 5 / 6 )|
We were recently given the opportunity to visit Einux Milpitas, California for a little hands-on experience with several Opteron 242-based servers. Though unable to conduct a thorough performance evaluation this time around, we were able to verify that the processors function normally both in Windows 2000 Professional (32-bit), Linux (64-bit) and a beta version of .Net Server 2003 Enterprise Edition (64-bit). Operating solely on 32-bit Windows 2000, the dual 1.6GHz Opteron system was putting down numbers just below a dual Xeon 2.4GHz server in SiSoft Sandra’s Drystone and Whetstone computation tests. Of course, these tests come nowhere near painting a complete picture of what AMD’s Opteron is capable of, so we’ll have to wait until a more suitable benchmarking environment is in place.
What we were able to come away with, though, was a broader sense of what supporting platforms will be immediately available and others that will materialize a short time later. Also, we were given a more accurate idea of how much demand exists for Opteron, even upon its launch, and a tangible sense that AMD is immediately ready with product. In other words, this isn’t just another paper launch to tease the masses.
|<% print_image("04"); %>||<% print_image("05"); %>|
There have been a host of motherboard manufacturers more than willing to showcase their Athlon 64 board designs, but we haven’t seen too many multiprocessor Opteron boards. Obviously, there won’t be many performance enthusiasts interested in a platform without AGP connectivity, but that is on its way. For now, we’ve got MSI’s K8D Master-FM, an Arima board, sold under the Accelertech brand, and a Selectron platform, which will probably never see retail availability, though at one time it promised a unique feature.
The MSI K8D Master-FM should be available within days of the Opteron launch, bringing with it dual-processor support. It is based on the AMD 8131 I/O Bus Tunnel and the 8111 I/O Hub. It can accommodate up to 12GB of DDR333 memory via six DIMMS, though it is important to note that all installed modules must be registered DIMMs. One of the board’s PCI-X slots can operate at 133MHz, while the other two 64-bit PCI slots operate at 66MHz. The remaining 32-bit slots operate at 33MHz. As is becoming standard server fare, the K8D Master-FM offers dual Gigabit Ethernet by way of two Broadcom controllers. Finally, the board offers integrated Rage XL graphics for basic video output standard back plate connectors, including USB 2.0 support.
|<% print_image("06"); %>||<% print_image("07"); %>||<% print_image("08"); %>|
Accelertech’s board is similar in many regards, but slightly more feature complete. Rather than six DIMM slots, it sports eight, along with a 16GB memory ceiling. Like the K8D Master-FM, the Accelertech MBO2161 employs the AMD 8111 and 8131 components, only the MBO2161 sports an extra PCI-X slot, which can be operated at 100MHz. The only other notable difference is that Accelertech has included Promise’s PDC20319 Serial ATA controller offering four channels and RAID 0, 1 and 10.
|<% print_image("09"); %>||<% print_image("10"); %>|
The last board, from Selectron, is largely similar to the other two, only it boasts two inconspicuous connectors, each capable of accommodating an expansion card with an additional processor. In essence, this board would allow a dual-processor server to function as a four-way system simply by installing the expansion cards and replacing the processors. However, at this point it seems highly unlikely that we’ll ever see the Selectron board, as the manufacturer has apparently bowed out of the Opteron race a bit early.
|<% print_image("11"); %>||<% print_image("12"); %>|
SIDEBAR: AMD will begin shifting its 64-bit processors to 0.09-micron early next year.
| Motherboards (cont’d)||Page:: ( 6 / 6 )|
Finally, we have uncovered details surrounding several workstation-centric boards that are still a couple of months out. Though still shrouded by non-disclosure agreements, we can say that in the near future, we expect to see powerful workstation systems powered by the AMD 8000-series chipset and NVIDIA’s nForce3 Professional. Both solutions will offer an AGP Graphics Tunnel and an AGP 8x slot. Of course, nForce3 will only support the single-processor Opteron (the 100 series), but we expect this will be the combination to buy if you’ve got money to spare and want the latest and greatest. In fact, Einux has already announced its NeoStation64 workstation, equipped with an Opteron running on an nForce3 Professional board and supporting Serial ATA RAID, AGP 8x, and up to 8GB of memory. Rex Wong, President of Einux, claims the system will be available this summer.
I was admittedly skeptical of AMD’s potential to succeed in a market dominated by Intel, and only time will tell if AMD’s 64-bit plans will pan out. However, it is clear that Opteron, even backed by a limited 64-bit infrastructure, is very real. The 1.4GHz Opteron 140 is priced at $283, the 1.6GHz Opteron 242 costs $690 and the 1.8GHz Opteron 244 has been set at $794. Remember, you’ll have to buy two if you’re looking to put together a new server. Gamers will undoubtedly want to wait for Athlon 64, while the more dedicated enthusiasts are better off reevaluating the situation in a couple of months.
|<% print_image("13"); %>||<% print_image("14"); %>||<% print_image("15"); %>|
Of course, we’ll be revisiting the idea of running benchmarks once the workstation platforms gain some momentum. But for now, it’s good to see Opteron garnering 64-bit operating system support in the form of Linux and even the beta copy of .NET Server Enterprise Edition that we ran seemed fairly solid in the limited time we experimented with it. And that’s a 64-bit OS from Microsoft – an integral endorsement to be sure.
In chatting with Tim Sweeny of Epic, it is clear his game development efforts will be greatly enhanced by the capabilities of Opteron and at some point in the future, he feels the end user will realize a tangible gain by transitioning to 64-bit as well. For the time being, though, most of the performance benefits seem to be a result of the architectural optimizations on which AMD has focused.
SIDEBAR: What do you think of today’s Opteron launch? Is AMD on the right track or are you still upset that you can’t buy a shiny new Athlon 64 CPU yet? Speak!