About us

Scan Windows 7

NVIDIA's GTC got off with a number of extended sessions aimed at showing developers how to make better use of the firm's GPUs but ended up highlighting a major problem, just how to feed the GPU with enough data.

 Two out of the three GTC sessions I attended had a very clear theme when it came to GPU performance, that the supporting infrastructure simply cannot match the computation capability of the GPU itself. Essentially NVIDIA's GPUs are being starved of data and there isn't much the firm can do about it in the short term.

 Acceleware, a firm that offers training courses in CUDA and OpenCL and is partly funded by NVIDIA, had Kelly Goss give a series of interesting talks about code optimisation for NVIDIA's Kepler architecture. Goss, a training program manager at Acceleware said the majority of CUDA algorithms were memory bound and proceeded to detail how developers need to be aware of the underlying memory hierarchy in the Kepler architecture to maximise the computation potential of the GPU.

 While Goss provided useful information in memory and data management, the underlying point was that developers need to reduce the number of instructions that operate on memory relative to computation operations, which is effectively going back to making your algorithm fit the GPU rather than the other way around.

 Frankly, going down to the memory hierarchy is a level that the vast majority of application developers will not want to do, especially those used to higher level languages such as Java and Python. That Goss has to talk about it to optimise code shows just how much work is required on the developer side to really make GPUs dance.

 Later NVIDIA's Andrew Page, a senior product manager of Advanced Technology at the firm, blamed FPGA video capture boards for limiting the overall performance that GPUs can achieve during video processing workflows. He highlighted the difference in PCI-Express bandwidth between NVIDIA's GPUs, which in the video processing market support PCI-Express Gen 2 16x while FPGAs generally top out at PCI-Express Gen 2 8x.

 Page suggested that video frames should be compressed at the capture side and transferred to the GPU where the GPU can then decompress the data using algorithms that make use of the GPU. Page's answer is a pragmatic solution that will undoubtedly mitigate the problem at hand, but the problem at hand is set to become even greater as the video industry moves towards 4K-resolution video and higher frame rates.

 So far GTC has shown that GPUs may have considerable compute power but feeding it with enough data is proving the biggest challenge for the firm and developers. The problem for NVIDIA is that it alone cannot solve the problems and any solutions will take years, not months.