Inside Matrox Architecture
256-bit, DualBus architecture
Matrox's Millennium G400 is based around its "256-bit DualBus architecture". Although the G200 also utilized a dual bus architecture (128-bit though), there is some degree of confusion as to what this means. You may have heard about the G400's tremendous performance at high resolutions and color depths - this can be directly attributed to the DualBus architecture of the chipset. This is how it's done.
Data on the video card is transferred through the video card's onboard bus. The bus, then, is just the path that data takes on the card. The main points of interest on this path are the graphics engine and data buffer(s). The graphics engine processes data necessary for the images that are displayed on the screen, while the data buffer stores data or instructions, which are both used by the engine.
Typically, the data goes on this bus in one direction at a time. There is one bus, and it is bi-directional. So, on any given clock cycle, the data can be going to the graphics engine, or to the frame buffer. On other video cards, we see this architecture in place with a 128-bit bus. Thus, data flows in 128-bit (or less) chunks at a time, which goes in the one direction described above. Matrox, though, made things better. With their 256-bit dual bus, Matrox divided up the bus into 2 independent, unidirectional buses, but each bus is 128-bits wide! So, the total bus width between the two buses is 256-bit. However, by dividing it into 2 separate unidirectional buses, we can have data flowing to the graphics engine and from the graphics engine at the same time.
To guarantee that everything is running efficiently, the chip logic for the graphics engine makes sure that on every clock cycle, both buses are doing their respective transferring of data. Another notable point to remember is that video memory can be clocked higher than the graphics engine, like we see with most of the current generation of video cards' graphics core and memory clock speeds. This plays in nicely with the 256-bit dual bus architecture, though, because we see that the "data in" and "data out" buffers are constantly active because each one has its own dedicated 128-bit bus. By combining this with a fast video memory bus, we can get great performance. This really bumps up the ability of the Millennium G400 to deliver in 2D, but also helps out in 3D and video performance.
Dual command pipelining
This is a feature that is part of the DualBus architecture. This technique is a method to avoid wasting clock cycles behind sending commands from the "data in" buffer. What typically happens in the case of a 128-bit bus is because the entire bus used in either the sending or receiving of data or instructions from the data buffer, you can only send every other clock cycle, consequently receive on the alternate clock cycles. So one clock cycle sends, and then the next receives. With dual command pipelining, the G400 is able to begin reading the received data from the "data out" buffer while the "data in" buffer begins sending the next set of data or instructions. This just further takes advantage of the fact that each data buffer has its own data bus.