[ Print Article! ]

Eternal Battle Day 2: The Ultimate Workstation
June 27, 2005

Summary: SCSI vs SATA? ECC Chip Kill vs Regular DDR? NVIDIA 16x FSAA? In day 2 of Eternal Battle 2005, we build our Ultimate Workstation. Total cost? $9180.


Part 2: The Ultimate WorkstationPage:: ( 1 / 9 )

Recap

Yesterday we went over the construction of a high-end gaming PC. We didn’t go “all out” in terms of coming up with exotic arrays of hard drives, or turn to liquid cooling, but we did select some of the highest performing products on the market. FiringSquad normally only pulls me out of “retirement” for one of these system building articles, but today we’re raising the bar and going with back-to-back system build articles. So for today, we’re looking at the ultimate workstation.

What makes a workstation?

The definition of a workstation is somewhat unclear. If you do work on a desktop, does it become a workstation? If you put a workstation on your desk and play games on it, does it become a desktop? The answer is not clear and it is something that can be debated. One key difference appears to be the way in which customers approach a product and their goals. Workstation buyers typically think in terms of application-execute-units. That is, they’re not as interested in just having a fast computer, but are interested in how many layers of real-time high-definition video processing the CPU can do, the speed at which it processes RAW images, or how long a frame will take to render. Therefore, for our workstation, we need to define our goals.

In this “ultimate” workstation build, I want to build a system that can do the following three tasks very well:
1) Manipulate complex 3D models with high precision and high-interactivity
2) Edit and process digital photographs produced from high-end D-SLRs and digital medium format backs
3) Handle video compositing and high-definition transcoding with aplomb.

So with those goals in mind, let’s start our ultimate workstation build. As always, we start with the platform.



SIDEBAR:

Well, if we want dual PCI express video card, we’re going to need to go with nForce 4 Professional, so that means the AMD Opteron platform.


Two or Four CPU cores?Page:: ( 2 / 9 )
Yesterday I said that the future of desktop computing lies in dual-core systems. On the workstation front, it’s essentially a given that a dual processor system makes the most sense. 3D content creation, CAD/CAE, digital photography, and scientific computing applications are almost all engineered for multiprocessor support. The question of course is whether it’s better to go with two or four cores.

The answer ends up being far easier than you’d think. The right answer for the vast majority of people, even those who have won the lottery, will be the Opteron 252’s. Although dual-core CPUs will be the right choice in the long-run, the clockspeed of today’s single core CPUs are almost 20% faster than the fastest dual core CPU. This means that unless you are using an in-house developed application that is trivially parallel, or are working with large amounts of digital photography RAW processing, the Opteron 252 is not only a better value than the 275 in terms of the price/performance ratio, but also provides better absolute real-world performance. So instead of buying a pair of 275’s at $1350 each, get a pair of 252’s at $870 each, leaving almost $1k for your other components… you’ll need that $1k later. As a side note, this recommendation to go with single core CPUs over dual core CPUs does not hold in the desktop arena since the pricing pattern is different.

2x AMD Opteron 252
2x $870
http://www.amd.com
Running Total: $1740


CPU Cooling

Socket 940 and Socket 939 coolers are virtually identical and so we’ll still use the Zalman CNPS7000B as our CPU cooler. There really isn’t a better cooler when it comes to balancing weight, cooling performance, noise, looks, and price. While an OEM builder may opt to go with a more traditional cooler from ThermalTake, AVC, or the like to save a few dollars here and there, for an enthusiast system builder, the CNPS7000B is a virtual no-brainer. The upcoming 9000 series from Zalman should be interesting, but we’re not confident that we would actually change our recommendation. Unless you’re going with radiator-based water-cooling, or are dealing with a 1U rackmount platform, we have no reason to recommend anything other than the Zalman CNPS-7000B. I have 5 of them running my systems, all purchased at retail.

Zalman CNPS7000B-AlCu
2x $40
http://www.zalmanusa.com/
Running Total: $1820

Arctic Silver 5

If there’s one item from these system builds that we recommend to each and every system builder, it’s the Arctic Silver 5 thermal grease. It’s easy to work with, it’s cheap in the long-run, and today’s large heatspreaders makes the concern about the possible conductivity and capacitance a non-issue.

[image]

<% print_image("01"); %>

Arctic Silver 5 – 3.5 g (enough for ~15 large CPU cores)
$10
http://www.arcticsilver.com/

Running Total: $1830



SIDEBAR: Arctic Silver doesn’t seem to dry out either.


Workstation MotherboardPage:: ( 3 / 9 )

What pushed Tyan ahead of competitors such as Iwill or ASUS was the dual PCIe x16 slots, made possible with two nForce Professional chipsets (both the 2200 and 2050 – you need both CPUs installed). While this makes SLI active only when using two CPUs, this is a non-issue considering the fact that target market will always be building dual CPU systems. It’s things like this that reflect Tyan’s attention detail and development of a system board that meets their customers’ needs rather than just a marketing department. In fact, if you look at our workstation and server build-articles, Tyan products have always ended up with our design win. They are expensive in terms of absolute cost, but their feature list meets this recommendation. Although full-bandwidth PCI express isn’t the limiting factor for games, it’s becomes more important for workstation 3D applications.

The feature list for the Thunder K8WE (S2895) reads like a wish list – it has virtually everything. The basics such as having 8 DIMM slots for NUMA performance, and dual NVIDIA Gigabit Ethernet are present, but it’s also has more PCIe bandwidth than any other platform on the market. There are also a 64-bit 133Mhz PCI-X and two 64-bit 100MHz PCI-X slots to provide support for professional FC or SCSI or 4:4:4 HD video capture add-on cards, and 6 fan headers.

Although the only SATA controller is that from the main NVIDIA nForce Professional 2200 chip, an on-board Ultra320 SCSI controller is available that taps into the PCI-X bus. We would have liked an additional 4 SATA-II ports on the second nForce southbridge, there wasn’t physical space on the motherboard (while keeping SCSI)! It would also have also been nice if the K8WE preserved the S/PDIF output that was present on the original K8W. Fortunately, an add-on sound card serves as a reasonable solution although the 4 slots taken up with SLI’d GPUs from NVIDIA makes it tough.

[image]

<% print_image("02"); %>

Tyan Thunder K8WE
$565 (non-SCSI)
http://www.tyan.com

Running Total: $2395

Why are dual processor motherboards so expensive?

At initial glance, it seems odd that a dual processor motherboard would cost more than twice the price of two regular motherboards. In fact, dual processor motherboards have always been very expensive due to increased complexity of maintaining all the traces. It’s not a linear increase in complexity or manufacturing costs. Likewise, the chipsets themselves are considerably more complex, and the costs of the plastic CPU sockets themselves are higher too. Considering the amount of engineering that goes into motherboards (both single and dual processor), they are relative bargains and if your usage pattern demands two physical CPUs, the price premium is a small price to pay. If you’re just an enthusiast, the advent of consumer-grade dual-core CPUs is by far the smarter approach. Don’t get a dual Opteron just because you think it’d be cool.


SIDEBAR: Tyan motherboards are designed in the USA.


Workstation MemoryPage:: ( 4 / 9 )

Some people will want 4GB or more and unfortunately, the market for low-latency registered RAM is small. 1GB “server” modules from both OCZ and Corsair use slower 3-3-3-8 timings. 1GB Registered DIMMs at 2.5-3-3-6 are available from Kingston, but it’s even more expensive. It costs $310 for 2x0.5GB of 2-3-2-6 Corsair RAM, $720 for 2x1GB of Kingston 2.5-3-3-6. The 2x1GB of standard 3-3-3-8 Corsair RAM is just $280. That’s pretty cheap and is almost as cheap as Corsair’s ValueRAM for budget desktops.

Since we did want this system to reflect maximum stability, we’ve opted to go with Corsair’s 1GB 3-3-3-8 DIMMs. While these are slower than the XMS low latency DIMMs that Corsair also produces, the conventional server PC3200 is JEDEC certified and runs stably without the need of any additional cooling.

4 x Corsair CM72SD1024RLP-3200
4x $140

http://www.corsair.com
Running Total: $2955

Why is ECC and Registered Memory So Important?

If you left your computer on 24 hours, 7 days a week, at the end of the year, a bit in memory would have been inappropriately flipped anywhere from two to twelve times. It depends on how much RAM you have. That is to say that somewhere in memory, a 0 turned in 1, or a 1 turned into 0. This can be caused by cosmic rays flying through your RAM or decay of the minute radioactive isotopes found in your RAM (the impurity need only be a single atom). Most of the time, this flipped bit is unimportant. Not only is this rare (especially when dealing with only one system), occurring maybe only once a month, the flipped bit may be in unallocated memory, or maybe it simply altered the position of a pixel for a fraction of a second. So, for most people it’s no big deal.

That said, if you’re unlucky, this flipped bit can alter critical data and cause your system to crash or if you are using the computer to solve a long math problem, it could potentially alter your results. If you’re running a Beowulf cluster where you have 100’s of PCs running, the chances of encountering an error is also 100x greater...

ECC memory provides error checking and correction facilities. By adding an extra memory chip to every DIMM (1 extra bit for every 8), ECC memory makes your memory act in a similar way to a RAID array. Unlike parity which could only detect single-bit errors, ECC can detect 1, 2, 3, or even 4-bit errors. However it is only able to fix a single bit-error.




SIDEBAR: An operating system designed to take advantage of NUMA can improve performance significantly.


IBM Chip KillPage:: ( 5 / 9 )

ECC is able to just fix that one changed bit. If two bits are changed, you’ve got no recourse. That’s where ChipKill comes in. This is an advanced form of ECC developed by IBM for the Mars Pathfinder mission. It is almost like having RAID for your memory. When data is written to RAM, a checksum is also written to a different part of memory. If a failure occurs, data can be recovered by recalculating the checksum. This allows multi-bit errors to be detected and corrected. If it’s your desktop PC that crashes, that’s OK but you wouldn’t say that for a server or workstation. The problem with ECC is that it will slow down performance (which is why vendors often benchmark with ECC off) and it automatically means that memory modules are at least 12.5% more expensive since you need 12.5% more memory.

Registered memory is like having a buffer. The key concept is “address loading.” In a regular unbuffered (non-registered) architecture, the address signal from the memory controller is sent to every RAM chip on every DIMM module. As you increase the number of banks and increased the load on the memory controller, the signal from the memory controller deteriorates from the ideal square-wave to a sine-like wave where the signal rises and falls very slowly, lengthening the signal. This can cause timing errors because the chipset will try to read a data signal that is not yet completed. This is why the very original Athlon64’s only supported 3 DIMMs; the on-board memory controller of modern Athlon64s has improved.


With a register, the memory chipset only addresses the register chip – one load rather than 16. On the next RAM clock cycle (half a system clock cycle since it is DDR), the register will send the signal to the RAM chips on the module. This ensures that the communication between the memory controller and RAM are timed precisely. The disadvantage is that there is a slight performance hit because of the additional latency.

Think of the register as the person at the front desk who’ll relay your message. If your voice is loud enough, it’s quicker to yell to the entire factory. If it’s not, it’s better to pass your message to the front desk, who can then directly page the recipient.

The other way to think about it is that as you add more and more memory to a regular system, memory timing needs to slow down. With registered DIMMS, you just have to face fixed register latency, but can then run 2-3-2. In other words, if you ran a system with only a little bit of memory, registering can slow you down. If you had a system with more banks of memory, not only is registered DDR more stable, but it might be faster.

For this server-grade reliability, the Opteron as well as server-grade Xeon motherboard chipsets are designed for registered ECC DDR-RAM. While all registered DDR is ECC (due to marketing/practical issues), all ECC memory is not registered.

Why we need 64-bit operating systems

Does 4GB sound like an unimaginable amount of system RAM? It did at one point. Even though no desktop runs 4GB, I think most people can agree that 4GB isn’t a “ridiculous” size. There is a problem of memory addressing vs. physical RAM. With 32-bit CPUs, it’s possible to address a maximum 4GB. The problem is that the region between 3 and 4GB of RAM has been “reserved” by AGP, PCI, and PCI-express devices. This is an addressing problem not a physical lack of memory. What you need to do is to remap the extra physical RAM in the region above 4GB of RAM. It’s like forwarding the address. When software asks to read from location 4.5GB, the motherboard can auto-route that to the region in physical RAM. The problem is that Windows XP Pro SP2 with DEP cannot reliably deal with this remapping, so even though you have 4 physical GB of RAM, it is possible to have a system where only 2.5GB is available for use. Windows XP Profession x64 Edition solves this problem. So it’s not about having 4GB of RAM so much as being able to have an SLI system with more than 2GB of RAM.



SIDEBAR: A really good discussion about the 4GB limits and virtual address space can be found here.


PSUPage:: ( 6 / 9 )
As we mentioned yesterday, there are few unanimous comments in computing but when it comes to power supplies, however, the best of the best is PC Power and Cooling. With our workstation, if we wanted the best PSU, we’d need to go with the PC Power and Cooling 850W SSI. Although the PC Power and Cooling has the 6-pin SSI connector (visually identical to the PCI Express connector, only there are 3.3V and 12V rails instead of the pure 12V rails), the Tyan Thunder K8WE is designed to run on standard EPS12V power supplies which have since been adopted under the BTX platform (24-pin main motherboard connector with an 8-pin secondary EPS12V connector).

The Turbo-Cool 850 SSI represents the first time that PC Power & Cooling has adopted a multi-rail design for their flagship Turbo-Cool brand. In the case of this PSU, it’s four +12V rails at 17A each with a true total of 54A sustained and 62A peak. There are 30A on the +5V and 20A on the +3.3V with continuous 850W of power at 50 degrees C, and a peak output of 950W.

This power comes with finesse as well. The main +3.3, +5, and +12V rails are all tightly regulated to 1%. The fan runs at a modest 32 dB, just 2 dB louder than the conventional 510 watt PC Power & Cooling. While this is louder compared to other PSUs, 32 dB is remarkable for a 950W peak power unit. PC Power & Cooling also tends to be more conservative, running the fans at a higher rate to ensure power stability under truly extreme environments.


Multiple vs. Single Rails

Multiple 12V rails were a hot topic in the past. With quad-rail 12V PSUs on the market now, some of you may be wondering when we’ll start to see quintuple rail PSUs. The important thing to realize is that more rails is not necessarily a good thing. Now, in the case of today’s quad rail PSUs from PC Power and Cooling and Silverstone, the additional 12V rails are a good design choice but that’s not because of the 4 rails..

The fundamental determinant of power supply quality and capacity is the main transformer that converts from AC to DC – think of this as the master pump that’s delivering water to your home. Having additional rails is like having secondary pumps located near the faucet. A single rail system is akin to having the same pump delivering water to the shower and to the toilet. As long as the pump is sufficiently strong, it doesn’t matter if one person uses the shower and another person flushes the toilet. However, if the single pump isn’t strong enough trouble ensues.

With multiple rails, it is as if you had a dedicated pump for the toilet and a dedicated pump for the shower. Now flushing the toilet doesn’t affect the shower. This can improve stability, but it still requires that you have enough incoming flow from the outside. So when it comes to power supplies, additional rails can assist with stability but you still need to have a beefy transformer.

On a quad-rail systems, power is typically divided into the GPU, CPU/motherboard, SATA, and Molex/motherboard. Here’s a question for you. What happens if you opt to use molex power adapters exclusively and never touch a SATA power port? Well, all of the amps reserved for that rail are never used. You’d lose 17A of total capacity.

PC Power & Cooling 850W SSI
$470
http://www.pcpow.com

Running Total: $3425


Chassis

Yesterday we talked about the importance of the BTX-style inverted motherboard design and discussed the strengths and weaknesses of the Lian-Li V-1000, V-1200, and V-2000 series as well as the Enermax MaxFlow/Silverstone Temjin T06. The considerations that go into a workstation chassis are pretty similar. We went with the Lian Li V-1200 for the workstation. Since Lian-Li did not have a review program, Newegg.com sponsored this portion of the article by sending us a retail unit.


Lian Li V-1200
$210 at Newegg.com
Newegg.com
Running Total: $3635




Hard DrivesPage:: ( 7 / 9 )


We went with a pair of striped Hitachi T7K250’s for the desktop, and so for the workstation, we’ll be going with a pair of Maxtor Maxline III’s in RAID-1 mirroring. Although Maxtor has a reputation of poor reliability on the Internet message boards, this reputation may be ill-deserved. Immediately after their merger with Quantum in 2001, Maxtor was the largest disk drive company in the world and with more drives on the market in the hands of end-users, one would expect more reports of failures. (Since then, Seagate has grown to #1 and Western Digital has reached #2). The other explanation for the negative perception of Maxtor is that if you compare Western Digital and Maxtor drives, the Maxtors seem to draw a bit more power and it’s possible that a bad power supply was to blame.

The MaxLine III is Maxtor’s flagship, and certified for higher workload that the DiamondMax 10. With its 16MB cache and NCQ, it should be a very fast performer. We got our drives from Directron.com – they’re a good alternative to Newegg, especially if you’re located in California (where sales tax is required at Newegg).

Pair of Maxtor Maxline III
2x $200
http://www.maxtor.com

Running Total: $4035

If SCSI is so great, why don’t more people use it?

If I won the lottery, I’d look toward SCSI drives. First let’s talk about the strengths or regular Ultra320 SCSI. One clear benefit is in the name, 320MB/sec. That’s faster than even SATA-II. Likewise, although SATA-II now has NCQ support, the queue is only 32 levels deep – SCSI supports a queue of 256 levels and more importantly, whereas SATA’s queue only offers simple tags, in SCSI data packets can have special tags such as “head of queue.” The ability for a drive to return out of order data actually is NOT a feature of SCSI, but the more advanced command queuing makes up for this. Serial Attached SCSI is the next generation of SCSI, the same way Serial ATA has surpassed ATA/133. One of the main benefits of SAS will be full-duplex communication across the bus meaning that the drive can deliver data to the SAS controller the same time the SAS controller is delivering data to the drive. Serial ATA is half-duplex.

The real magic of SCSI, however is the market. If you think about SCSI being virtually equivalent to having Serial ATA travel back in time a half decade, and that before Ultra160 SCSI, you had several previous generations of SCSI it’s pretty clear that SCSI drive technology was extremely advanced for its time. The system was complicated, requiring pristine quality cabling, but for individuals who needed maximum performance, SCSI was perfect. Since customers were looking toward SCSI as the flagship performance, manufacturers knew they could invest exotic technologies into their drives. Grandma didn’t care if her 4800 rpm IDE drive couldn’t stream uncompressed HD video, but pro’s did. So over time, IDE drives focused on increasing capacity (important for the home user) while SCSI drives focused on increasing performance and reliability. It was evolution, with manufacturers succeeding as their HDDs met the needs of each niche.

That is to say, SCSI drives are built to more reliable standard and with faster performance not because SCSI itself is better than Serial ATA, but because people who own SCSI controllers are willing to pay for that performance and reliability making it economically sound. The WD Raptors are really modified SCSI drives built to Serial ATA specs. They continue to outperform any traditional 7200rpm drive, and yet when it comes to drive technology it reflects “old” SCSI technology.

The simple reason we don’t use SCSI more often is price. We don’t find that extra performance worth the extra cost -- media professionals think differently.



The Quadro FXPage:: ( 8 / 9 )

Of course, the question is between the GeForce 6800 Ultra/GeForce 7800GTX and Quadro FX 4400. Although built around the same underlying design, today’s Quadro’s and GeForces’s are more than just driver rewrites and modifications to the BIOS – they are truly different cores today.

[image]

<% print_image("03"); %><% print_image("04"); %>

It’s interesting that note that the sub-pixel precision of NVIDIA’s platform continues to be superior to ATI even over the last two years. The entire Quadro FX line, including the bottom of the line Quadro FX 330 features 12-bit sub-pixel accuracy. In fact, even the GeForce FX 5200 which most gamers would be ashamed to own features 12-bit sub-pixel accuracy. ATI’s workstation models don’t compete in this feature set.




Their FireGL V3100 has a terrible 4-bit sub-pixel accuracy, the same as the original Rage128, and the newer FireGL V5100 only reaches 8-bit precision. Sub-pixel accuracy is important for placement of polygons and this will make the biggest difference in CAD and 3D DCC applications. 16x FSAA is also a very important feature, more so than traditional gaming because lines rather than textures capture much of the data. (Be sure to view the full-resolution pictures and not the auto-resized ones from FiringSquad to see the full effect)

[image]
<% print_image("05"); %><% print_image("06"); %>

[image]
<% print_image("07"); %><% print_image("08"); %>

[image]
<% print_image("09"); %><% print_image("10"); %>


There’s little doubt that these Quadro’s will be extremely fast performers for workstation applications. The question is how well it does with games.

NVIDIA Quadro FX4400 512MB PCI-E
2x $1800
http://www.nvidia.com

Desktop: $7835


SIDEBAR: That’s right. We spent just as much on graphics as we did for the entire desktop computer.



Final componentsPage:: ( 9 / 9 )

The requirements for workstations aren’t significantly different from desktops, so we’ll just breeze through the other components.

Plextor PX-716SA
$125
http://www.plextor.com

Running Total $7960

Logitech Cordless Comfort Duo - $100
Logitech MX700 (Refurbished) - $50

http://www.logitech.com

Running Total $8110

1.44 Floppy Drive
$15
Running Total: $8125



Monitor

Workstations users don’t require the same pixel refresh rates that gamers need. At the very most, you’ll need the ability to handle smooth 30 fps video. Our monitor of choice here is going to be the NEC 1980FXi which was a FiringSquad Editor’s Choice product and acts as our current reference LCD monitor. With its 18ms S-IPS panel (recall that 25 ms S-IPS is virtually as fast as 16 ms TN+film) and rich color support.

NEC Mitsubishi LCD1980FXi
$650
http://www.necmitsubishi.com
Running Total: $8775

Operating System

Windows XP 32-bit SP2 is still needed for universal compatibility, but the Windows XP Professional x64 edition would be the right approach for maximizing the 4GB of memory. Although 64-bit applications are rare, Photoshop CS2 has a partial 64-bit mode in which more than 2GB of RAM can be used by Photoshop. Home-brew or hand-tuned Linux is an option for those experienced with Linux systems, but for a commercial package, SUSE Linux 9.3 Professional is the standard. Triple booting all three works.

$150 Windows XP Professional 32-bit OEM
$155 Windows XP Professional x64 edition OEM
http://www.microsoft.com

$100 SUSE Linux Professional 9.3
http://www.suse.com


Running Total: $9180

Conclusion

Wow. A 9 thousand dollar system, and I’ve just got a single 19” monitor? The lesson learned here is that when it comes to high-precision, high-performance graphics, it can get quite expensive. Without the Quadros, we’d essentially be able to bring the costs back down to something a little bit more palatable. That said, the pair of Quadro FX 4400’s means that I can drive four Apple 30-inch Cinema HD screens. I could very easily have built it as a $20,000 workstation … but that would be plain crazy.

[Waves Jedi hand] See you tomorrow.


© Copyright 2003 FS Media, Inc.
[ Print Article! | Close Window ]