Summary: With two quad-core Xeon X5365 processors running in tandem at 3.0GHz, Intel's V8 eight-core system packs quite a punch. Join us as Alan and Alexis take a look at the performance potential of this platform in a range of benchmarks.
The arms race continues. Today, we are taking a look at Intel’s latest weapon, dual quad-core Intel Xeon X5365 processors. The platform has affectionately been dubbed the V8 platform by Intel as you’ve effectively got eight cores of processing power available at your disposal. Quad-core Xeon processors have been available for Intel since the end of last year, while the Xeon X5365 is available in select systems today and broad availability is expected in Q3’07.
Back in the day, a car with a V-8 engine got everyone’s attention; now, you need at least 12, or maybe 16 cylinders to get us interested. Similarly, only a year or two ago, having more than one CPU in a computer was out of the reach of an ordinary consumer. Advances in both technical manufacturing and design have played a part in allowing for this power growth, but the software engineers are also to thank for writing code that makes the best use of this power. Within the past 18 months, the growth in computing power has shattered Gordon Moore’s famous law. This has come through more efficient computing, not just faster computing.
Will we get to the point where we can have too much power? With cars, I’ll tell you that having nearly 300 horsepower in stop-and-go San Francisco traffic doesn’t help anyone except the gas companies. With airplanes, transportation at over mach 1 could not be sustained as evidenced by the retirement of the Concord. Well, the timing of the V8 comes on the heels of Microsoft Vista.
We’re playing with fire today.
The Intel V8 “Media Creation PC” isn’t actually a new platform. In fact, it’s nothing more than an Intel workstation motherboard paired with a pair of 3GHz “Clovertown” Xeon CPUs. The Clovertown CPU is based upon Intel’s Core 2 platform with the key architectural advantages being a 1333 MHz FSB, and quad core. Unlike the Apple Mac Pro, Intel’s reference motherboard designs have a single PCIe x16 slot. For this reason, Intel positions the V8 as a Media Creation PC although comparison to AMD’s 4x4 (two dual-core CPUs with four PCI Express graphics slots therefore effectively giving you Quad Core with Quad SLI) can be made.
Motherboard: Intel S5000XVN dual-socket motherboard
The Intel S5000XVN is designed as a motherboard that provides server-class performance and reliability while providing PCIe x16 support for high-end graphics. This isn’t just marketing speak.
Memory: 4GB Samsung DDR2 FB-DIMM @ 667MHz
Although DDR-2 from companies such as OCZ and Corsair DDR-2 run up to 1GHz, servers and workstations continue to run at slower rates. This has less to do with reliability than it has to do with capacity.
CPU: Dual Xeon quad core X5365 @ 3.0GHz, Clovertown cores. FSB 1333MHz
The chips in this setup are difficult to find as Intel has been giving Apple priority. They run at 1333MHz bus and draw about 150 watts a piece, more than other Clovertown CPUs. The included heatsinks are a little wimpy, compared to what we can get on the aftermarket for Core 2 Duo and Athlon64. It’s just a solid chunk of copper with thin fins and a loud fan. You would think that if you paid $1200 for a CPU that it would come with something better than a $20 heatsink and fan.
Power Supply: OCZ ProXStream 1000W Power Supply
Normally, beefy power supplies are overkill for gamers. In this case, we definitely needed more power. The Clovertown CPUs pull a peak of 150W. We have a pair so that sets our power budget at 300W from the get go. Compare this to the Core 2 Duo’s which only require 65W.
Chassis: Silverstone Temjin TJ-07
The TJ-07 continues to be the Bugatti Veyron of the PC chassis industry. The monoblock unibody design has no trouble supporting all the components, and the extensive cooling zones keep everything running within spec.
Video Card: ATI Radeon X1950 XT
Vista x64 has compatibility issues with NVIDIA GPUs, the latest Forceware Drives (including 158.18) and our professional-grade NEC LCD Monitors. If you use a different LCD monitor, it’s OK. If you use the Vista-bundled drivers, it’s OK. If you use regular Windows XP, it’s OK too. It’s just this combination of Forceware 100+ and Vista x64 and certain monitors that causes problems. The Radeon has no such problem, so we’re going with the fastest ATI card on the market (at the time of this article)
Hard Drive: Seagate Barracuda 7200.10 750GB SATA2 x2
Going with Serial Attached SCSI would have offered the best performance with our system, however the argument for high-capacity 7200rpm SATA drives is hard to beat. We keep bouncing back and forth between Seagate and Hitachi drives in our system. Currently, we like the 7200.10 Barracuda’s with perpendicular recording.
OS: Windows Vista Ultimate 64-bit
As if we were going to run anything else…
Intel V8 – As noted in previous page
Microsoft Excel 2007
In the above cases, we saw superb scaling of the V8 core over quad core and dual core systems from Intel. The story was different as our benchmarks got more complex.
LS-DYNA is a general purpose transient finite-element solver. It’s used to simulate all sorts of things ranging from metal forming applications and structural analysis to large deformation studies like bird strike simulations in aerospace applications. It has its roots from Lawrence Livermore Laboratories as the solver used to simulate nuclear warhead design. It’s auto-parallelization is superb and this is a great real-world test of something other than “embarassingly parallel” computational science.
In this benchmark, I chose to perform a car crash simulation using a dataset available in the public domain. To make testing easier, I only simulated a single time step of a 535,000 element model of a Plymouth Neon crashing into a barrier.
There are a few interesting points to be drawn from the graph. There was a near linear increase in performance going from the 1.8GHz Core 2 Duo to the 3.0GHz Core 2 Duo. Going from two 3GHz cores to four 3GHz cores represented only a 66% improvement, and doubling the number of to eight cores only added another 35%.
Based upon data from Sun Microsystems, even AMD Opteron with Infiniband scales substantially better on the same benchmark. Going from one 3GHz Opteron to two single-core Opteron 3GHz’s improved performance 99%. Doubling that to four single-core Opteron 3GHz’s resulted in a 87% improvement. Doubling that to eight single-core Opteron 156’s resulted in a 93% improvement. Doubling that to sixteen single-core Opteron 156’s resulted in an 81% improvement...
In other words, for a memory-bandwidth intensive application such as LS-DYNA, going from dual core to octo core on the Intel platform meant improving productivity 2.25x. Going from two single-core Opterons to eight single-core Opterons with Infiniband interconnects improves productivity by 3.6x – this number should be even higher with native Hypertransport interconnects.
We unleashed the Intel V8 on our next test application, Bibble 4.9.5. Bibble is a RAW processing tool used by digital photographers. It is well-known for highly optimized code and multi-core support. In fact, when I originally tested the dual dual-core Opteron systems, Bibble was processing RAW files so quickly that I had to run my benchmarks several times to make sure everything was working. It was too fast.
Although Bibble is 8-core aware and sends data to all eight processor cores, the software was unable to saturate the CPU. This was certainly a disappointment given the prior performance that was seen with the Opteron platform. Importantly, our tests from several years ago have shown that memory bandwidth and latency plays a significant role in RAW processing.
Although the memory bandwidth of the Intel V8 was pushing 6GB/sec in Windows Vista x64, AMD’s 4x4 platform pushes closer to 14.5GB/sec of memory bandwidth. Suffice it to say, the Intel V8 platform is memory bandwidth limited.
What about Games?
Dual core games are now in the mainstream, however it still remains unclear when we will begin to see games taking advantage of quad and octo-core systems. We did have troubles with our NVIDIA cards in Windows Vista x64, and so we will have to revisit these numbers at a later date.
For the 3D modeler or power Excel 2007 user, the Intel V8 is amazing. However, this system is almost too fast for today’s software. We aren’t seeing the across the board increase in performance that we saw in the past with the move from single to dual core processing. On some of the benchmarks, the 8 cores were not being saturated. Was this caused by limitations in the multi-core code in the software? Memory bandwidth limitations? Or both?
One aspect was slightly disappointing, the memory bandwidth. In its current design, Intel’s memory bandwidth doesn’t scale as well with the processor as AMD’s design. For certain applications, including computational science and based upon our best analysis, advanced digital photography, AMD’s memory controller on the CPU will have the advantage. We will have to reserve any conclusions until we get some AMD OctoFX action into our labs. Likewise, it’s important to realize that even though the Intel architecture doesn’t “scale as well” as AMD’s architecture, the dual Clovertown’s at 3GHz will still give you the best performance in most applications. Equally as important, since most of us are only deciding between dual and quad core, the scalability limit isn’t there yet.
In terms of practicality, the V8 in its current form probably isn’t going to make it into many homes. One detail we haven’t mentioned thus far is that with the Intel workstation motherboard, boot time is delayed by at least 35 seconds as the motherboard goes through its diagnostic checks – even in the “fast boot” mode. FB-DIMMs are still too costly and with limited availability. 4GB remains a sweet spot for power users, and the AMD platform has reached 16 and 32GB without the same problems.
There is a solution to this though: time. Give it another year or so and we might be reviewing a small form factor V8.
A few years ago, gamers had to choose between the Pentium III and the Athlon XP. It was a fierce battle between AMD and Intel and things were exciting. We saw the race to 1GHz CPUs with AMD edging out Intel by just a matter of hours. With the Pentium 4 Northwood, the balance of power slowly shifted away. The scalability of the Pentium 4 core when it came to clock speed helped Intel pull further and further away. Just when AMD’s future seemed doomed, the Athlon64 entered the scene and in a blink of an eye, the AMD64 platform became the unanimous choice for enthusiasts. With Athlon64 X2, it seemed like AMD was unstoppable... until Core 2 Duo entered the scene. Since then, Intel changed the course of the war and the Core 2 Duo became the de facto CPU for hardware enthusiasts.
So our conclusion? With the Intel V8, we are seeing two things. The Intel V8 is fast. Faster than anything we’ve ever tested before. However, for the first time, we are seeing limitations of the platform behind Intel’s Core 2 architecture. This is monumental because we are seeing the potential for AMD’s memory architecture to play a major role in the upcoming 8-core world and beyond. Suddenly, CPUs have gotten interesting again.
|© Copyright 2003 FS Media, Inc.|