About us

Scan Windows 7

ATi Touts DX10
The briefing...

Thursday 25th May 2006 and a group of journalists congregate in a small but very nicely decorated suite at Paddington's Hilton Hotel. Finally seated with sharpened pencils, bottled water and an unhealthy number of free jelly beans in our hands we eagerly wait on the words of Richard Huddy, worldwide developer relations manager for ATi, regarding DX10 and how ATi will be taking advantage of it.

Unlike previous versions of Microsoft's DirectX API, version 10 will be a clean-sheet product written from scratch from the ground up. DirectX 10 will feature no native support in its code to run DirectX 9 or earlier versions but Microsoft have promised compatibility and even an enhanced version of DX 9, though presumably through some kind of layer that may mean it runs slower than it ever has before. I'm speculating on that of course.

There are many aspects to DirectX 10 including all-new DLLs and the ability to isolate and operate specific parts of the graphics subsystem thus greatly reducing the need for reboots after driver updates or of hardware glitches bringing your system down, but it seems they key feature as far as ATi is concerned is the unified shader architecture.

At present, graphics hardware relies on two types of shader, a vertex shader and a pixel shader. In simple terms a vertex shader builds a model and the pixel shader colours it in. The trouble starts when you realise that there are usually only a limited number of shader pipelines available, and it's up the GPU manufacturer to try and decide which ones should become vertex shaders and which should become pixel shaders, and the way it's tended to be done so far has been roughly three or four times as many pixel shaders as vertex shaders.

Click to enlarge

The actual vertex/pixel shader split is always going to be a bit of a compromise because, as the image above demonstrates, if you have a fairly complex model like a shark which is not only animated but must also be skinned in order to allow it to bend and flex without the geometry falling apart then it may need a lot of vertex power but, because its body isn't very heavily textured or lit in a complex way, the demands on the pixel shader are low. In essence a huge percentage of your silicon is sat idle. Below the shark is an example where the situation is reversed. Modelling waves, certainly large aves, is not terribly demanding in vertex processing terms, but the light and shade, refractions and reflections, foam and spray all require heavy use of the pixel shaders. Again the result is that a portion of your GPU has a snooze until it's needed again

Here's a graph which shows pixel and vertex activity while rendering a single frame from a game. In addition to demonstrating nicely why we usually need more pixel shaders than vertex shaders, it also shows how there are large (relatively) periods of time when one shader is busy and the other is doing very little.

Click to enlarge

There are times (circled) when load on both vertex and pixel shaders is fairly evenly matched, and its on these occasions that the hardware is at its most design-efficient in a sense. There are also times when the load balance just isn't optimal, such as at the beginning where the pixel shader is working while the vertex shader is idle, while towards the end of the frame, possibly where an animated character is being drawn, the vertex shader is busy and the pixel shader is doing almost nothing.

The way around this problem seems glaringly obvious but was, until now, technologically impossible, you simply create shaders that can operate as either pixel or vertex shaders depending on what's required at any given time. These shaders operate using exactly the same command set no matter what role they are performing, they also have access to exactly the same resources on the GPU. These are unified shaders and this is the heart of DX10.

If a unified shader model wasn't already needed under DirectX 9, it certainly will be under DirectX 10 because in addition to pixel and vertex shaders we'll also see a third shader added, the geometry shader. Unlike the vertex shader which can only operate on single vertices (the point where two sides of a polygon meet). The geometry shader, which will be fed from a vertex shader incidentally, can work with an entire polygon. It can also create new polygons and, equally importantly, destroy them too which is quite unlike the pixel and vertex shaders which are fundamentally one-in one-out type processes. Adding and removing triangles from a model may seem like a minor thing but it makes for a significantly more flexible chip.
Remarkably the unified shader will have the flexibility to utilise a pixel shader to work with data that's not technically related directly to a pixel, so it can output data that's needed for physics calculations or vertex data or it could even output shader routines which essentially means it could in a sense be used to write and its own code. Sounds like the basis for AI doesn't it? Perhaps we'll see GPUs that code routines based on workload and essentially write their own drivers in the years to come? We're certainly heading towards GPUs that function like CPUs though with far more parallel processing power.

Click to enlarge

Dividing the workload up and telling each shader what type of shader it should be and for how long is a function of a secretive part of the GPU kn own as the thread arbiter. ATi tells us that this can tell a shader to switch from, say, a vertex shader to a pixel shader and this will happen in a single clock. I asked whether a unified shader running as a pixel shader would be as efficient as a dedicated pixel shader and was assured it would be, and likewise for the vertex and geometry shaders. All of this is transparent to the game developers who simply tell the GPU what needs doing and let the GPU decide how it's done.

Geometry shader results can also be written directly to memory for use again in the pipeline without any CPU intervention. Used correctly the geometry shader will greatly accelerate particle systems (smoke, fire, water etc.) as well as things like cube mapping which creates world-accurate reelections in reflective surfaces.

Despite all this ATi seemed keen to point out that this isn't all about making things run faster, it's more about being able to throw realism into the game without slowing it down, so rather than seeing huge increases in frame rate you're more likely to see things like additional scene clutter, detail, foliage, animation and so on. this is because the data still has to be executed and a frame of 3D can be drawn no more quickly than it takes to execute the longest or most complex shader instruction, however with more shaders now on call games can take advantage of them and add extra realism while the essential are being done. In short then we're shooting for realism not speed.

Another big deal with DirectX 9 was what's known as the small batch problem. In essence every instruction the graphics pipeline has to action (known as "objects") incurs a penalty in terms of performance overhead, partly from DirectX itself, then from the driver. When lots of smaller objects are filing through the pipeline this overhead builds into a quite significant percentage of the overall time-to-screen.

DX10 will use various techniques to reduce the overhead and help to reduce the issues associated with the small batch problem. One method will be to better manage what's known as state changes. Traditionally a state change involved the game barking instructions at the GPU and it would then execute them in their totality. DX10 will use tricks like constant buffers to store things like light positions for example and thus reduce the amount of data being shipped around. there will also be what ATi call state "snapshots" which are essentially shader macros, so instead of having to send an entire set of instructions to a shader when they've already been executed at least once, that routine can be stored with a particular identifier and the API can simply request that the shader run routine XYZ again. Then there are tricks like predicated rendering which allows some commands to be totally ignored if the results of another part of the pipeline say they should be, great for things like occlusion culling. There are also things called texture arrays which let the GPU swap textures back and forth in real time with no CPU intervention required. Paging of graphics memory also increases texture handling flexibility and should make for bigger in-game textures.

Click to enlarge

The aim is to remove the CPU from the equation as much as is possible, which is fine but we then become far more reliant on fast GPUs and even more importantly, efficient drivers. The aim is to reduce the API and driver overhead to a point where it looks something like in the diagram below:

Click to enlarge

Click to enlarge ATi have a distinct edge when it comes to designing their unified shader silicon in that they've already done it once for the X-Box360, though the 360's lack of geometry shader capability mean it's not a true DX10 part, and true DX10 will be the only kind of DX10 you'll get. Gone is partial precision, gone is "caps bits" which allowed a game or application to poll the graphics card for information on its capabilities, gone are any of the inconsistencies of previous DX incarnations. Either the hardware supports DX10 in its entirety or it's mot a DX10 part at all, it's that simple. That's not to say ATi, or anyone else in fact, can't include capabilities above an beyond what's called for by DX10, in fact ATi admitted to having done just that, it simply means that they won't be accessible from within DX10 and will need to be coded for individually by games developers, perhaps then eventually being absorbed into the next release.

Apparently John "OpenGL" Carmack has been quoted as saying the X-Box 360 is "the most productive graphics development platform I've ever worked on" and referred to it as "Clean and powerful hardware that's well documented and easy to exploit". Games will also be far easier to port from X-Box to PC and vice vera which will speed up development times but may also stifle a certain amount of the creativity we've often benefited from on the PC. There is an element of worry in the fact that PCs and consoles are beginning to share so much of the core architecture that they are essentially doing the same things in the same ways.

PC users may also feel slightly aggrieved at the fact that a "decent" DX10 graphics card will probably cost more than an X-Box 360 which comes complete with hard drive, game controller, optical drives and so on. When Microsoft sinks so much money into chip development with ATi I'd be curious to know how much say they have in when the PC gets to benefit from all that development, if ever, and what it will cost.

So that's about it in summary. There's much more to DirectX 10 than I've touched on here, including what's being sold to us as a very robust Shader Model 4, but ATi are clearly excited about unified shaders and their ability to move a step closer to 100% efficient silicon. The potential downside is that games will start to look the same no matter whose hardware you're running, though the benefits for developers in having a standard platform to write for may well outweigh this negative. Of course none of this matters until we see Vista hit the shelves, and that gives us at least 6 months to ponder the possible impact. Personally I can't wait for R600.

If you'd like to discuss this article then head over to THE FORUMS.