About us

Scan Windows 7

Although AMD believes Cray's move to Intel for its latest XC30 cluster was due to GPGPU accelerators supporting PCI Express Gen 3, NVIDIA said customers are unlikely to see much of a performance hit from current-generation accelerators running on PCI Express Gen 2.

Last week I wrote about Cray's decision to move from AMD over to Intel in the XC30 cluster, a move that was described by AMD as expected due to the growing number of accelerators that make use of the PCI Express Gen 3 bus, something that isn't supported by AMD's Opterons at present. Now NVIDIA, the biggest vendor of GPGPU accelerators, has said PCI Express Gen 3 does not affect accelerator performance by much and that in its own internal benchmarks, Intel's Sandy Bridge Xeons simply out perform AMD's Opteron chips.

NVIDIA's Tesla K10-series and Tesla K20-series of GPGPU accelerators have been a success for the firm as it pushes the Kepler architecture into the high performance computing (HPC) market. The firm has taken the top spot in November's Top 500 powering Cray's Titan cluster, which uses NVIDIA's Tesla K20X mated to AMD Opteron 6200-series processors, effectively not making use of the PCI Express Gen 3 support on the Tesla K20X.

So when I asked Sumit Gupta, general manager of NVIDIA's Tesla business unit whether PCI Express Gen 3 support for accelerators was a big deal, I was a bit surprised to hear that the bus isn't the bottleneck. Gupta said the firm's tests the bottleneck was more likely to be the Opteron's CPU performance compared to Intel's latest Xeons.

"When we [NVIDIA] benchmark AMD Opteron plus GPU versus [Intel Xeon] Sandy Bridge plus GPU, the performance is much better when you use Sandy Bridge plus GPU. It's not to do with [PCI Express] Gen 2 or Gen 3 but more to do with just the core performance of Sandy Bridge [which] is so much better. It also dramatically helps GPU accelerated applications because applications still run on the CPU. I know that is one of the reasons why customers prefer Sandy Bridge plus GPU," said Gupta.

Of course hearing an chap from NVIDIA talk down AMD (read ex-ATI) hardware is hardly new but given that NVIDIA has to work with AMD in the area of HPC and its leading cluster is powered by AMD chips, there is very little reason for Gupta to be laying into AMD for no reason. That AMD's Bulldozer and Piledriver Opterons can't compete with Intel's Sandy Bridge Xeon processors in out right grunt is something many have known for a long time but that it affects overall performance of clusters that rely on GPGPU accelerators is a further disappointment for the firm and its supporters.

As for the difference between the generations of PCI Express bus, Gupta said while there is a performance hit dropping from PCI Express Gen 3 to PCI Express Gen 2, it is far lower than many believe. Gupta said, "The [PCI Express] Gen 2 versus Gen 3, there is an impact but it's not as big as people think. We benchmarked a bunch of applications, in the average case we found the impact to be one or two percent, in the best case we found the case to be 10 percent better performance. So I think people have a notion that PCI Express is super important, don't get me wrong we really look forward to [PCI Express] Gen 3 working nicely with Intel, Sandy Bridge has been a problem, but maybe Ivy Bridge will fix the issues, I'm just saying that it's not as big as people are making it out to be. When you do real application benchmarking I don't see so many application that dependant on [PCI Express] Gen 2 versus Gen 3."
Pipelining mitigates problems of bandwidth limitations

I asked Gupta why the extra bandwidth of the PCI Express Gen 3 bus wasn't providing the performance gains many would expect given the relatively limited (6GB or 8GB) memory on the accelerator boards, to which he said developers effectively pipeline their code in order to make use of the time during which the GPGPU is grinding through data to line up more data for it to work on. "They overlap their communications and computation, so as they are computing on the GPU they are moving in the data for the next computation and moving out the results from the previous one," said Gupta.

So according to Gupta, developers are able to overcome some of the limitations posed by the bandwidth of the PCI Express bus by using the GPGPU as a batch processor rather than being able to churn through large datasets in real-time. Nevertheless Gupta said, "You do need [PCI Express] Gen 2, you can't move down to [PCI Express] Gen 1, that would kill performance".

Referring back to Gupta's comment about PCI Express Gen 3 support he said, "Most devices have had problems working with Gen 3 on Sandy Bridge", adding they were stability problems. Nevertheless, PCI Express Gen 3 support clearly didn't harm University of Texas' Stampede cluster with its Xeon Phi 5110P accelerator boards coming in at number seven on November's Top500 list.

AMD's lack of support for PCI Express Gen 3 in its Opteron 6200 and 6300-series processors is a problem and it seems increasingly unlikely it is losing business because of HPC accelerator boards the firm claims. Cray said it enables the firm to incorporate its Aries interconnect chip and now NVIDIA says GPGPUs, including its own, has very little to gain from PCI Express Gen 3. All of this circles back to the original point I made last week about AMD not only missing out on an important technology but now adds another worrying component to the mix, that AMD's core CPU performance is limiting Opterons even in situations where GPGPUs are providing the vast amount of overall processing power.