Summary: SCSI vs SATA? ECC Chip Kill vs Regular DDR? NVIDIA 16x FSAA? In day 2 of Eternal Battle 2005, we build our Ultimate Workstation. Total cost? $9180.
Recap
Yesterday we went over the construction of a high-end gaming PC. We didn’t go “all out” in terms of coming up with exotic arrays of hard drives, or turn to liquid cooling, but we did select some of the highest performing products on the market. FiringSquad normally only pulls me out of “retirement” for one of these system building articles, but today we’re raising the bar and going with back-to-back system build articles. So for today, we’re looking at the ultimate workstation. What makes a workstation?
The definition of a workstation is somewhat unclear. If you do work on a desktop, does it become a workstation? If you put a workstation on your desk and play games on it, does it become a desktop? The answer is not clear and it is something that can be debated. One key difference appears to be the way in which customers approach a product and their goals. Workstation buyers typically think in terms of application-execute-units. That is, they’re not as interested in just having a fast computer, but are interested in how many layers of real-time high-definition video processing the CPU can do, the speed at which it processes RAW images, or how long a frame will take to render. Therefore, for our workstation, we need to define our goals. Well, if we want dual PCI express video card, we’re going to need to go with nForce 4 Professional, so that means the AMD Opteron platform.
The answer ends up being far easier than you’d think. The right answer for the vast majority of people, even those who have won the lottery, will be the Opteron 252’s. Although dual-core CPUs will be the right choice in the long-run, the clockspeed of today’s single core CPUs are almost 20% faster than the fastest dual core CPU. This means that unless you are using an in-house developed application that is trivially parallel, or are working with large amounts of digital photography RAW processing, the Opteron 252 is not only a better value than the 275 in terms of the price/performance ratio, but also provides better absolute real-world performance. So instead of buying a pair of 275’s at $1350 each, get a pair of 252’s at $870 each, leaving almost $1k for your other components… you’ll need that $1k later. As a side note, this recommendation to go with single core CPUs over dual core CPUs does not hold in the desktop arena since the pricing pattern is different. 2x AMD Opteron 252 2x $870 http://www.amd.com Running Total: $1740 CPU Cooling
Socket 940 and Socket 939 coolers are virtually identical and so we’ll still use the Zalman CNPS7000B as our CPU cooler. There really isn’t a better cooler when it comes to balancing weight, cooling performance, noise, looks, and price. While an OEM builder may opt to go with a more traditional cooler from ThermalTake, AVC, or the like to save a few dollars here and there, for an enthusiast system builder, the CNPS7000B is a virtual no-brainer. The upcoming 9000 series from Zalman should be interesting, but we’re not confident that we would actually change our recommendation. Unless you’re going with radiator-based water-cooling, or are dealing with a 1U rackmount platform, we have no reason to recommend anything other than the Zalman CNPS-7000B. I have 5 of them running my systems, all purchased at retail. Arctic Silver 5
If there’s one item from these system builds that we recommend to each and every system builder, it’s the Arctic Silver 5 thermal grease. It’s easy to work with, it’s cheap in the long-run, and today’s large heatspreaders makes the concern about the possible conductivity and capacitance a non-issue.
Arctic Silver 5 – 3.5 g (enough for ~15 large CPU cores) $10 http://www.arcticsilver.com/ Running Total: $1830 SIDEBAR: Arctic Silver doesn’t seem to dry out either.
What pushed Tyan ahead of competitors such as Iwill or ASUS was the dual PCIe x16 slots, made possible with two nForce Professional chipsets (both the 2200 and 2050 – you need both CPUs installed). While this makes SLI active only when using two CPUs, this is a non-issue considering the fact that target market will always be building dual CPU systems. It’s things like this that reflect Tyan’s attention detail and development of a system board that meets their customers’ needs rather than just a marketing department. In fact, if you look at our workstation and server build-articles, Tyan products have always ended up with our design win. They are expensive in terms of absolute cost, but their feature list meets this recommendation. Although full-bandwidth PCI express isn’t the limiting factor for games, it’s becomes more important for workstation 3D applications. The feature list for the Thunder K8WE (S2895) reads like a wish list – it has virtually everything. The basics such as having 8 DIMM slots for NUMA performance, and dual NVIDIA Gigabit Ethernet are present, but it’s also has more PCIe bandwidth than any other platform on the market. There are also a 64-bit 133Mhz PCI-X and two 64-bit 100MHz PCI-X slots to provide support for professional FC or SCSI or 4:4:4 HD video capture add-on cards, and 6 fan headers. Although the only SATA controller is that from the main NVIDIA nForce Professional 2200 chip, an on-board Ultra320 SCSI controller is available that taps into the PCI-X bus. We would have liked an additional 4 SATA-II ports on the second nForce southbridge, there wasn’t physical space on the motherboard (while keeping SCSI)! It would also have also been nice if the K8WE preserved the S/PDIF output that was present on the original K8W. Fortunately, an add-on sound card serves as a reasonable solution although the 4 slots taken up with SLI’d GPUs from NVIDIA makes it tough. [image]
Tyan Thunder K8WE $565 (non-SCSI) http://www.tyan.com Running Total: $2395 Why are dual processor motherboards so expensive?
At initial glance, it seems odd that a dual processor motherboard would cost more than twice the price of two regular motherboards. In fact, dual processor motherboards have always been very expensive due to increased complexity of maintaining all the traces. It’s not a linear increase in complexity or manufacturing costs. Likewise, the chipsets themselves are considerably more complex, and the costs of the plastic CPU sockets themselves are higher too. Considering the amount of engineering that goes into motherboards (both single and dual processor), they are relative bargains and if your usage pattern demands two physical CPUs, the price premium is a small price to pay. If you’re just an enthusiast, the advent of consumer-grade dual-core CPUs is by far the smarter approach. Don’t get a dual Opteron just because you think it’d be cool.
Some people will want 4GB or more and unfortunately, the market for low-latency registered RAM is small. 1GB “server” modules from both OCZ and Corsair use slower 3-3-3-8 timings. 1GB Registered DIMMs at 2.5-3-3-6 are available from Kingston, but it’s even more expensive. It costs $310 for 2x0.5GB of 2-3-2-6 Corsair RAM, $720 for 2x1GB of Kingston 2.5-3-3-6. The 2x1GB of standard 3-3-3-8 Corsair RAM is just $280. That’s pretty cheap and is almost as cheap as Corsair’s ValueRAM for budget desktops. Since we did want this system to reflect maximum stability, we’ve opted to go with Corsair’s 1GB 3-3-3-8 DIMMs. While these are slower than the XMS low latency DIMMs that Corsair also produces, the conventional server PC3200 is JEDEC certified and runs stably without the need of any additional cooling. 4 x Corsair CM72SD1024RLP-3200 4x $140 http://www.corsair.com Running Total: $2955 Why is ECC and Registered Memory So Important?
If you left your computer on 24 hours, 7 days a week, at the end of the year, a bit in memory would have been inappropriately flipped anywhere from two to twelve times. It depends on how much RAM you have. That is to say that somewhere in memory, a 0 turned in 1, or a 1 turned into 0. This can be caused by cosmic rays flying through your RAM or decay of the minute radioactive isotopes found in your RAM (the impurity need only be a single atom). Most of the time, this flipped bit is unimportant. Not only is this rare (especially when dealing with only one system), occurring maybe only once a month, the flipped bit may be in unallocated memory, or maybe it simply altered the position of a pixel for a fraction of a second. So, for most people it’s no big deal.
ECC is able to just fix that one changed bit. If two bits are changed, you’ve got no recourse. That’s where ChipKill comes in. This is an advanced form of ECC developed by IBM for the Mars Pathfinder mission. It is almost like having RAID for your memory. When data is written to RAM, a checksum is also written to a different part of memory. If a failure occurs, data can be recovered by recalculating the checksum. This allows multi-bit errors to be detected and corrected. If it’s your desktop PC that crashes, that’s OK but you wouldn’t say that for a server or workstation. The problem with ECC is that it will slow down performance (which is why vendors often benchmark with ECC off) and it automatically means that memory modules are at least 12.5% more expensive since you need 12.5% more memory. Registered memory is like having a buffer. The key concept is “address loading.” In a regular unbuffered (non-registered) architecture, the address signal from the memory controller is sent to every RAM chip on every DIMM module. As you increase the number of banks and increased the load on the memory controller, the signal from the memory controller deteriorates from the ideal square-wave to a sine-like wave where the signal rises and falls very slowly, lengthening the signal. This can cause timing errors because the chipset will try to read a data signal that is not yet completed. This is why the very original Athlon64’s only supported 3 DIMMs; the on-board memory controller of modern Athlon64s has improved. With a register, the memory chipset only addresses the register chip – one load rather than 16. On the next RAM clock cycle (half a system clock cycle since it is DDR), the register will send the signal to the RAM chips on the module. This ensures that the communication between the memory controller and RAM are timed precisely. The disadvantage is that there is a slight performance hit because of the additional latency. Think of the register as the person at the front desk who’ll relay your message. If your voice is loud enough, it’s quicker to yell to the entire factory. If it’s not, it’s better to pass your message to the front desk, who can then directly page the recipient. The other way to think about it is that as you add more and more memory to a regular system, memory timing needs to slow down. With registered DIMMS, you just have to face fixed register latency, but can then run 2-3-2. In other words, if you ran a system with only a little bit of memory, registering can slow you down. If you had a system with more banks of memory, not only is registered DDR more stable, but it might be faster. For this server-grade reliability, the Opteron as well as server-grade Xeon motherboard chipsets are designed for registered ECC DDR-RAM. While all registered DDR is ECC (due to marketing/practical issues), all ECC memory is not registered. Why we need 64-bit operating systems
Does 4GB sound like an unimaginable amount of system RAM? It did at one point. Even though no desktop runs 4GB, I think most people can agree that 4GB isn’t a “ridiculous” size. There is a problem of memory addressing vs. physical RAM. With 32-bit CPUs, it’s possible to address a maximum 4GB. The problem is that the region between 3 and 4GB of RAM has been “reserved” by AGP, PCI, and PCI-express devices. This is an addressing problem not a physical lack of memory. What you need to do is to remap the extra physical RAM in the region above 4GB of RAM. It’s like forwarding the address. When software asks to read from location 4.5GB, the motherboard can auto-route that to the region in physical RAM. The problem is that Windows XP Pro SP2 with DEP cannot reliably deal with this remapping, so even though you have 4 physical GB of RAM, it is possible to have a system where only 2.5GB is available for use. Windows XP Profession x64 Edition solves this problem. So it’s not about having 4GB of RAM so much as being able to have an SLI system with more than 2GB of RAM.
The Turbo-Cool 850 SSI represents the first time that PC Power & Cooling has adopted a multi-rail design for their flagship Turbo-Cool brand. In the case of this PSU, it’s four +12V rails at 17A each with a true total of 54A sustained and 62A peak. There are 30A on the +5V and 20A on the +3.3V with continuous 850W of power at 50 degrees C, and a peak output of 950W. This power comes with finesse as well. The main +3.3, +5, and +12V rails are all tightly regulated to 1%. The fan runs at a modest 32 dB, just 2 dB louder than the conventional 510 watt PC Power & Cooling. While this is louder compared to other PSUs, 32 dB is remarkable for a 950W peak power unit. PC Power & Cooling also tends to be more conservative, running the fans at a higher rate to ensure power stability under truly extreme environments. Multiple vs. Single Rails
Multiple 12V rails were a hot topic in the past. With quad-rail 12V PSUs on the market now, some of you may be wondering when we’ll start to see quintuple rail PSUs. The important thing to realize is that more rails is not necessarily a good thing. Now, in the case of today’s quad rail PSUs from PC Power and Cooling and Silverstone, the additional 12V rails are a good design choice but that’s not because of the 4 rails.. Chassis
Yesterday we talked about the importance of the BTX-style inverted motherboard design and discussed the strengths and weaknesses of the Lian-Li V-1000, V-1200, and V-2000 series as well as the Enermax MaxFlow/Silverstone Temjin T06. The considerations that go into a workstation chassis are pretty similar. We went with the Lian Li V-1200 for the workstation. Since Lian-Li did not have a review program, Newegg.com sponsored this portion of the article by sending us a retail unit.
We went with a pair of striped Hitachi T7K250’s for the desktop, and so for the workstation, we’ll be going with a pair of Maxtor Maxline III’s in RAID-1 mirroring. Although Maxtor has a reputation of poor reliability on the Internet message boards, this reputation may be ill-deserved. Immediately after their merger with Quantum in 2001, Maxtor was the largest disk drive company in the world and with more drives on the market in the hands of end-users, one would expect more reports of failures. (Since then, Seagate has grown to #1 and Western Digital has reached #2). The other explanation for the negative perception of Maxtor is that if you compare Western Digital and Maxtor drives, the Maxtors seem to draw a bit more power and it’s possible that a bad power supply was to blame. The MaxLine III is Maxtor’s flagship, and certified for higher workload that the DiamondMax 10. With its 16MB cache and NCQ, it should be a very fast performer. We got our drives from Directron.com – they’re a good alternative to Newegg, especially if you’re located in California (where sales tax is required at Newegg). Pair of Maxtor Maxline III 2x $200 http://www.maxtor.com Running Total: $4035 If SCSI is so great, why don’t more people use it?
If I won the lottery, I’d look toward SCSI drives. First let’s talk about the strengths or regular Ultra320 SCSI. One clear benefit is in the name, 320MB/sec. That’s faster than even SATA-II. Likewise, although SATA-II now has NCQ support, the queue is only 32 levels deep – SCSI supports a queue of 256 levels and more importantly, whereas SATA’s queue only offers simple tags, in SCSI data packets can have special tags such as “head of queue.” The ability for a drive to return out of order data actually is NOT a feature of SCSI, but the more advanced command queuing makes up for this. Serial Attached SCSI is the next generation of SCSI, the same way Serial ATA has surpassed ATA/133. One of the main benefits of SAS will be full-duplex communication across the bus meaning that the drive can deliver data to the SAS controller the same time the SAS controller is delivering data to the drive. Serial ATA is half-duplex.
Of course, the question is between the GeForce 6800 Ultra/GeForce 7800GTX and Quadro FX 4400. Although built around the same underlying design, today’s Quadro’s and GeForces’s are more than just driver rewrites and modifications to the BIOS – they are truly different cores today. [image]
It’s interesting that note that the sub-pixel precision of NVIDIA’s platform continues to be superior to ATI even over the last two years. The entire Quadro FX line, including the bottom of the line Quadro FX 330 features 12-bit sub-pixel accuracy. In fact, even the GeForce FX 5200 which most gamers would be ashamed to own features 12-bit sub-pixel accuracy. ATI’s workstation models don’t compete in this feature set. Their FireGL V3100 has a terrible 4-bit sub-pixel accuracy, the same as the original Rage128, and the newer FireGL V5100 only reaches 8-bit precision. Sub-pixel accuracy is important for placement of polygons and this will make the biggest difference in CAD and 3D DCC applications. 16x FSAA is also a very important feature, more so than traditional gaming because lines rather than textures capture much of the data. (Be sure to view the full-resolution pictures and not the auto-resized ones from FiringSquad to see the full effect) [image]
[image]
[image]
There’s little doubt that these Quadro’s will be extremely fast performers for workstation applications. The question is how well it does with games. NVIDIA Quadro FX4400 512MB PCI-E 2x $1800 http://www.nvidia.com Desktop: $7835 SIDEBAR: That’s right. We spent just as much on graphics as we did for the entire desktop computer.
The requirements for workstations aren’t significantly different from desktops, so we’ll just breeze through the other components. Plextor PX-716SA $125 http://www.plextor.com Running Total $7960 Logitech Cordless Comfort Duo - $100 Logitech MX700 (Refurbished) - $50 http://www.logitech.com Running Total $8110 1.44 Floppy Drive $15 Running Total: $8125 Monitor
Workstations users don’t require the same pixel refresh rates that gamers need. At the very most, you’ll need the ability to handle smooth 30 fps video. Our monitor of choice here is going to be the NEC 1980FXi which was a FiringSquad Editor’s Choice product and acts as our current reference LCD monitor. With its 18ms S-IPS panel (recall that 25 ms S-IPS is virtually as fast as 16 ms TN+film) and rich color support. Operating System
Windows XP 32-bit SP2 is still needed for universal compatibility, but the Windows XP Professional x64 edition would be the right approach for maximizing the 4GB of memory. Although 64-bit applications are rare, Photoshop CS2 has a partial 64-bit mode in which more than 2GB of RAM can be used by Photoshop. Home-brew or hand-tuned Linux is an option for those experienced with Linux systems, but for a commercial package, SUSE Linux 9.3 Professional is the standard. Triple booting all three works. Conclusion
Wow. A 9 thousand dollar system, and I’ve just got a single 19” monitor? The lesson learned here is that when it comes to high-precision, high-performance graphics, it can get quite expensive. Without the Quadros, we’d essentially be able to bring the costs back down to something a little bit more palatable. That said, the pair of Quadro FX 4400’s means that I can drive four Apple 30-inch Cinema HD screens. I could very easily have built it as a $20,000 workstation … but that would be plain crazy. | ||||||||||||||||||||||||||||||||||||||||||||
| © Copyright 2003 FS Media, Inc. |