[ Print Article! ]

Intel Discusses Nehalem, Larrabee, Dunnington
March 17, 2008 Chris Crazipper Angelini

Summary: Yesterday Intel shed more light on their upcoming next-generation CPU designed to replace today's Core 2 CPUs, Nehalem. They also divulged more details on their graphics project, codenamed Larrabee. In this article Chris goes over the highlights of Intel's briefing with members of the press. Read all about it inside!


IntroductionPage:: ( 1 / 5 )


At the same time, this is business as usual for Intel. The company is adhering to its architecture and silicon cadence—the tick and tock that enables Intel to move to a new manufacturing technology and then launch a fresh micro-architecture once manufacturing has been perfected. Intel’s 45nm Penryn family, represented on the desktop as the dual-core Wolfdale and quad-core Yorkfield chips, was the tick. Later this year, Intel will start production on the tock—an entirely new architecture currently called Nehalem.

Pat Gelsinger, Sr. vice president and general manager of Intel’s Digital Enterprise group, gave us a fairly in-depth look at the Nehalem die in anticipation of IDF Shanghai 2008. We also got an idea of how Intel will modularize the architecture to create a diversified product lineup targeting different price points. We even came away with information about the platform set to power high-end desktop and workstation Nehalem-based machines.

After he had covered Nehalem, Intel’s Gelsinger spent some time discussing Larrabee—a hush-hush graphics project expected in the 2009-2010 timeframe. Intel wasn’t ready to get into specifics about the chip’s architecture or specifications. In fact, we’ve learned more about Larrabee from reading Ars Techica’s coverage of the processor. However, there were a handful of interesting nuggets tossed around and an ultra-confident endorsement by Gelsinger. If all goes as planned, Larrabee stands a good chance of changing the GPU landscape for us gamers.

[image]

<% print_image("01"); %>

On an unrelated, but still interesting note, Intel also introduced plans for its Tukwila and Dunnington designs. The former is an upcoming addition to the Itanium family. It’s a monster, made up of two billion transistors and comprised of four cores, 30MB of cache, and dual integrated memory controllers. Obviously, it won’t have any bearing on desktop gaming, but the die shot is incredible when you consider its complexity.

The latter, Dunnington, will be the final Penryn-based Xeon. Pat Gelsinger explained the company’s approach to Dunnington as a balance based on extensive workload analysis, which is why you’re seeing six cores and fairly large cache.

[image]
<% print_image("02"); %>

“In this segment of the market, everything is already thread-enabled,” Gelsinger says. So, the company had the option to go quad-core with lots of cache or eight-core with less cache. Eight-core with lots of cache would be cost-prohibitive for a Xeon chip. The decision to implement six cores and 16MB of L3 cache yielded, in Intel’s collective mind, the best possible compromise. Dunnington is to be manufactured using Intel’s 45nm Hi-K node—good, since the design incorporates 1.9 billion transistors on a single die. To put that into perspective, the quad-core Yorkfield, which actually consists of two dual-core Wolfdales, sports 410 million transistors times two.

When Dunnington launches in the second half of 2008, Intel says it’ll be socket-compatible with the Caneland platforms already shipping. The quad-socket, dual independent front side bus chipset we know as the 7300 is decidedly enterprise-class, so don’t expect to see Dunnington make its way into gaming platforms any time soon.




Nehalem, NudePage:: ( 2 / 5 )

Does This Look Familiar?


Although Intel’s Gelsinger made no reference to his competition, he had to know the press would, at some point, put some of the design elements of Nehalem up against AMD’s Barcelona and Phenom architectures. As far as we see it, there’s nothing wrong with that. It’s been said over and over that AMD has an extremely elegant solution. Even if it isn’t winning gold medals for performance right now, the company’s engineering principals are solid.

The die shot Intel showed us featured four cores. However, Gelsinger says Nehalem will scale from two to eight cores. Each core has its own L1 and L2 cache, along with access to a central pool of shared, inclusive L3 cache. And the execution resources are fed with data piped in through an integrated three-channel DDR3 memory controller. If that isn’t Phenom-ish enough for you, logic for a HyperTransport-like QPI (QuickPath Interconnect) officially replaces the front-side bus Intel has relied on for so long.

[image]

<% print_image("03"); %>

One of the keys to Nehalem, according to Gelsinger is its scalability. As mentioned, Intel can build the 45nm chips with as few as two or as many as eight cores. It can implement one QPI link or more, if the bandwidth is needed. The L3 cache and memory controller are also separate components. Hopefully, as memory technology evolves, the modularity of the memory controller will allow Intel to adjust the logic in kind.

The last component of Intel’s scalability story is an integrated graphics block. High-end Nehalem-based processors will naturally rely on discrete graphics for optimal performance. But the more mainstream models will include integrated graphics on the CPU package (not on-die). Gelsinger wasn’t ready to elaborate on the potency of its upcoming integrated graphics solution. However, he did say it’d be an evolutionary step forward from what we see built-in today and not related to Intel’s work with Larrabee. Given what we’ve seen from G35, Intel has significant work to do before it’s able to compete with the performance of AMD’s graphics technology.

[image]
<% print_image("04"); %>


A Micro-Architecture Unveiled


Intel is singing a much different tune today than it was during NetBurst’s heyday. Back then the story was all about clock speed and tweaking the execution core in whatever way enabled the fastest frequencies. Now Intel is focused on maximizing IPC and managing power.

Nehalem retains the ability to process four instructions per clock cycle and in that way is similar to the Core 2 Quad preceding it. Intel is bolstering performance by bringing back SMT (we knew it as Hyper-Threading back in the day), extending the SSE4 instruction set, adding more cache, as mentioned, and improving the way that data moves through the entire platform, ideally delivering two to three times more peak bandwidth, according to Intel. At the same time, Nehalem chips will sport dynamically managed cores, threads, cache, and interfaces, which we interpret to mean the processor’s building blocks will be throttled up and down—or even turned off completely—in response to loading characteristics. This is really good news if you’re one of those power users eyeballing the virtualization features in Windows Server 2008, for example. One powerful Nehalem processor, complemented by several gigs of memory, can drive three or four virtualized operating systems without breaking a sweat. Then, when the heavy lifting is done, scale back to a much more energy-efficient state. It remains to be seen just how granular Intel gets with power management in its next-gen architecture.

There are a handful of on-chip changes that’ll speed things up as well. Nehalem boasts increased parallelism, boosting the number of micro-ops (pieces of x86 instructions) that can be in-flight at any given time. Intel enhanced commonly used algorithms to help minimize “dead cycles,” too. Expect to see gains in threaded software, where faster synchronization primitives improve performance. Intel’s branch predictor—a tool for guessing whether a conditional branch is taken or not—should be more accurate thanks to a new second-level branch target buffer and new renamed return stack buffer. Gelsinger says to expect the second-level BTB to improve branch predictions in apps with large code footprints, like databases. The RSB should help Nehalem avoid return instruction mispredictions.

[image]

<% print_image("05"); %>

By adding Hyper-Threading to Nehalem, Intel is going to make it possible for a single processor equipped with four cores to operate on eight threads at the same time. In the past, Hyper-Threading received mixed reviews because it didn’t always spit back higher performance numbers. Perhaps it was ahead of its time, though. Threaded software was still rare and the only real way to show it off was in a multi-tasked environment. Threading is much more prevalent now, though, and Intel’s other enhancements (bigger cache, higher bandwidth) make Hyper-Threading a more attractive feature.

Speaking of cache, Nehalem sticks to the same 32KB instruction / 32KB data L1 cache configuration as existing Core processors. Each core gets its own 256KB L2 repository. And there’s an 8MB shared L3 cache available to all four cores.



Resurfacing the StreetsPage:: ( 3 / 5 )

Bye Bye, FSB



As they say, the devil is in the details and you probably don’t care how information is moving in your system, so long as Crysis is running at a playable frame rate or your video decode is smooth. Intel wants to improve its communications situation there, though. And so the company is eradicating the traditional front-side bus in favor of the QuickPath Interconnect (formerly referred to as CSI). QPI is point-to-point, like HyperTransport. And while representatives at Intel stop one step short of calling it a serial link, since data still moves in parallel, the QPI is much narrower and much faster than the front-side bus of old. In fact, a QPI link runs at 6.4 Gigatransfers per second, delivering up to 25 GBps of bandwidth. Each Nehalem-based CPU features two QPI links, creating a triangle in a dual-chip server configuration between both processors and an I/O hub. You’ll only use one link on a 1P desktop.

[image]

<% print_image("06"); %>

According to Jeff Casazza, technology marketing manager in Intel’s server platform group, QuickPath will make the biggest impact in the server world, where multi-core chips and multiple processor sockets resulted in the worst digital traffic jams.

Hello, Memory Controller


As the front-side bus fades to black, Intel will also introduce its first integrated memory controller, which Pat Gelsinger says supports DDR3 at 800, 1066, and 1333 MHz. The controller sports three channels per processor, pumping out copious bandwidth. Of course, that also means installing memory modules three at a time if you’re hoping to maximize performance. That’s six at a time if you’re building a server or workstation with two CPUs. Then again, in a 2P configuration, Intel says DDR3-1333 will deliver four times more memory bandwidth that today’s Harpertown architecture on a 1600 MHz FSB. Talk about opening the floodgates.

Obviously, this is a complete re-haul of Intel’s existing platform design where you have a processor connected to a memory controller hub, which is then joined up to an I/O controller hub. Instead, a high-end Nehalem-compatible chipset will look a lot more like AMD’s 790FX with one northbridge component dedicated to PCI Express connectivity and a southbridge responsible for storage, sound, USB, and so on.

As you move down to the mainstream level, the core logic story gets even simpler. The I/O controller hub that delivered PCI Express functionality disappears and the Nehalem processor hooks right up to the southbridge What was once a three-chip platform (CPU, northbridge, and southbridge) turns into a two-chip configuration. Hopefully, the arrangement helps reduce costs as entry-level Nehalem systems emerge in 2009.



LarrabeePage:: ( 4 / 5 )

Intel Talks Graphics


Intel is still very conservative when it comes to details on its Larrabee project. Most of what’s already known comes from leaked presentations and the subsequent online analysis. Nevertheless, we were interested to hear how Intel plans to change the face of software development with an approach that borrows from its processor business.

Rather than zero in on specifics about the technology, Pat Gelsinger chose to paint a broad picture about what Intel thinks is necessary for an era of visual computing. Think accurate shadows and behavioral realism able to give us new levels of interactivity. This is a platform problem, says Intel. The company has already talked about its next-gen processor and chipset. Now it’s working on the graphics side.

[image]

<% print_image("07"); %>

After listening to the recorded briefing twice, I still came away scratching my head a bit, wondering how Larrabee would manifest itself in the 2009-2010 timeframe. The highlights Intel’s Gelsinger covered include:


  • Many-core architecture running in parallel

  • Big vector units per core

  • Cache coherent

  • IA-programmable




When you combine all of those high-level attributes, it becomes a little clearer that Intel is designing a graphics architecture that will benefit from the company’s extensive library of programming tools, easing the development workload.

Here’s where Larrabee may take the graphics industry by surprise. Remember the industry reaction when the Cell Broadband Engine started making its rounds? Programming the hardware was said to be extremely difficult, despite the results you could get out of it. According to Gelsinger, the reaction from software developers involved with Larrabee has, in contrast, been tremendous thus far—in fact, he says it has received more enthusiasm than any other program in his 30-year career.

[image]
<% print_image("08"); %>

Pressed for more information, Gelsinger pulled back a bit. He did, however, reveal that Larrabee would emerge as a discrete graphics product with full support for OpenGL and DirectX. However, it’d also incorporate a richer programming model enabled by Intel’s own tools. Because the software guys are already familiar with a lot of those tools via the company’s CPUs, Intel expects that the ISV community will be able to do more than it has in the past with DirectX, CUDA, and so forth.



Playing a Promising HandPage:: ( 5 / 5 )


At the same time, we can already see the marketing guys at AMD saying “we told you so.” Yes, yes. AMD’s HyperTransport interface and integrated memory controller were both great architectural moves that helped the Athlon trounce Intel’s NetBurst. And now both companies are going to market with very similar concepts. Imitation is the sincerest form of flattery, is it not?

With regards to Intel’s foci at this year’s IDF Shanghai, the Tukwila and Dunnington MP designs are hardly relevant to gaming. Impressive looking behemoths, perhaps. But you’d never look to 7000-series Xeons for anything short of enterprise applications.

Nehalem is far more interesting, despite the fact that Intel is talking about a processor and platform more likely to bear the Xeon moniker than the Core brand. Even then, we’re talking early 2009 before the first Nehalem-based servers, workstations, and high-end desktops see widespread availability. The future looks bright, though. Triple-channel DDR3 memory, a high-speed point-to-point interconnect, Hyper-Threading revived—we’ll take all three.

Larrabee remains the dark horse. Intel is projecting 2009-2010 readiness for the hardware, and of course it’ll need a development effort to be well underway. AMD and NVIDIA won’t be sitting idle between now and then, so it’ll be interesting to see how Intel’s vision today compares with hardware in a year or two. There’s a good chance that the ISV community’s experience with Intel’s processors will lead to a quick adoption of the company’s efforts in graphics.



© Copyright 2003 FS Media, Inc.
[ Print Article! | Close Window ]