Cray's XC30 high performance computing (HPC) cluster marked a significant loss for AMD and it was due to its decision to adopt PCI Express Gen 3 almost two years after Intel, leaving the firm selling processors supporting a bus that is no longer fit to meet customer's demands and forcing loyal customers over to Intel.

Previously I wrote that AMD had not made the most of technological advantages it had over Intel with its Opteron range of processors and one of the advantages I mentioned was HyperTransport, a high performance bus that back in 2003 was miles better than anything Intel had to join up its Xeon processors. Now it seems AMD has lost the opportunity to do so and has been very slow to support PCI Express Gen 3, something Intel has been supporting with its Sandy Bridge Xeons since March 2012, leading to Cray jumping over to Intel for its latest HPC cluster design.

As Cray - arguably the biggest name in HPC - was getting ready to take the wraps off the XC30, previously known as Cascade, I spoke to Barry Bolding, Cray's vice president of Storage and Data Management and corporate marketing about why the firm had decided to jump from AMD to Intel. Cray has been one of AMD's biggest supporters over the past decade and brought the chip designer considerable marketing clout so its move to Intel caused something of a double-take.

During the course of our conversation, Bolding made it clear that Cray's choice of Sandy Bridge Xeons was due to PCI Express Gen 3 support, which is still lacking from AMD's Opteron processors. AMD's Opteron 6300-series processors support PCI Express Gen 2 and use HyperTransport as the bus between processors.

Bolding said, "Our move to PCI [Express] Gen 3 was critical to having higher bandwidth in and out of compute nodes and currently Intel has the best PCI [Express] Gen 3 support, so it matches our network and the performance is good. Right now Intel's providing us with a PCI [Express] Gen 3 processor that can work with good data rates on our network. That will continue to evolve, with Interlagos [AMD Opteron 6200-series] today, we run with PCI [Express] Gen 2, so I think that's one aspect. The other aspect is in 2013, we believe that Sandy Bridge and Ivy Bridge processors will provide cutting edge general X86 performance."

Bolding effectively confirmed AMD's HyperTransport no longer holds the performance crown when it comes to interconnect performance between CPUs in the HPC cluster environment, which is a worrying sign for AMD as one of its core technologies is no longer a unique selling point and even perhaps even more importantly, being perceived as inferior to market alternatives.

Bolding explained the thought process behind Cray's choice of processor for a cluster design. "When we [Cray] built the XT, XC, XK line [of clusters], the best way to talk to a processor was through AMD's proprietary HyperTransport mechanism. That's what we based our network on because there wasn't any better way to talk to a processor, so it started with the network and the Opteron happens to be a great processor. [In the XC30] it starts with the PCI [Express] Gen 3 because you need to have a more scalable network. We even considered building a proprietary interconnect interface but it limited our choices in processors," said Bolding.

While Bolding added Cray would continue to sell AMD chips on its existing XK6 clusters and let us not forget that the XK6-based Titan cluster sporting Opteron 6200-series processors topped the prestigious Top500 list last week, albeit with considerable help from Nvidia's Tesla K20X accelerator boards. Cray's decision to move away from AMD processors in the XC30 is damaging for the chip designer not because of the future sales volumes as AMD had previously told me HPC accounts for somewhere between five and 10 percent of its Opteron sales, but the reasons that strongly suggest at long term operational problems at the firm.

PCI Express Gen 3 can't come soon enough

When I asked Suresh Gopalakrishnan, the corporate vide president and general manager of AMD's Server Business Unit, about Cray's move to Intel he said, "We will have PCI Express Gen 3 as well. The switch by Cray to PCI Express Gen 3 is expected on that side of the business, because PCI Gen 3 is where all the discrete GPU accelerators will be connected to, so we have Gen 3 in the works as well. There is nothing preventing putting an AMD chip into a Gen 3 based system." AMD seems to believe the move to PCI Express Gen 3 was due to supporting the latest and greatest accelerators, which may well be the case but I fear it is somewhat nearer Bolding's explanation.

Cray's value-add on HPC clusters is its interconnect ASICs, in the case of its XC30 cluster that would be the new Aries chip, and the performance of that chip is vital in Cray winning business because the majority of hardware in a HPC cluster these days are common off-the-shelf components. Gopalakrishnan is of course right in that the latest accelerators such as NVIDIA's Tesla K10-series and K20-series, AMD's Firepro S10000 and Intel's Xeon Phi all support PCI Express Gen 3 but given that Titan, which uses Nvidia's Tesla K20X and Opteron 6200-series chips that support PCI Express Gen 2, it is clear that not everyone is looking to PCI Express Gen 3 solely for squeezing out the final drop of accelerator performance.

Gopalakrishnan said AMD would have PCI Express Gen 3 support in its next generation Opteron processors, currently known as Steamroller. As for why AMD doesn't have PCI Express Gen 3 he told me, "It's based on how the roadmaps are executed. It's not something like deliberately saying, "hey we don't want to do [PCI Express Gen] 3.0", it's just that when the [Opteron] 6300 had all its IP snapped and put into the SoC bus, [PCI Express Gen] 2.0 was available. From a roadmap point of view the next generation will have [PCI Express Gen] 3.0 IP. The IP is ready so it's being worked into the SoC at this point."

It is important to realise just how far behind AMD will be when its Steamroller Opterons come out. Intel's Sandy Bridge Xeons came out in March 2012 while Steamroller Opterons, if AMD continues with its one year product cycle, will arrive almost 20 months later supporting this seemingly must-have connectivity. Now there's fashionable late but in this case AMD is so late it may not even catch the hangover in the morning as it remains almost inconceivable that Intel won't have rolled out its Ivy Bridge Xeon processors by then. While Ivy Bridge Xeons won't bring that much in terms of performance as can be seen by the marginal improvements in benchmarks on the consumer processors based on the architecture, it brings power utilisation benefits that comes via shrinking the geometry from 32nm to 22nm, something very important for the server and HPC market.

Intel offers more than just connectivity

For AMD the real kicker became clear when I asked Bolding whether Cray would win any new business by simply having Intel chips, effectively ignoring the performance aspect of Intel's Sandy Bridge Xeons. He said, "If you look at the market presence Intel Xeon has a larger market presence than AMD does, there's no doubt about that. While most customers want the best performance, they are agnostic whether it is AMD or Intel, there are some that prefer one or other. Just looking at the market presence, our total addressable market will increase by having an Intel-based product in our portfolio."

So AMD has a performance problem by not supporting PCI Express Gen 3 and it now has a customer perception problem, even in the HPC market. AMD always had to overcome Intel's marketing billions in the somewhat irrational consumer market but when the firm loses HPC customers based perception and doesn't have the performance advantage in core technologies then it really is shaping up to be a vicious cycle, and one AMD can't afford to break out of.

Cray still has a good relationship with AMD according to Bolding and I have no doubt that it does, but given the firm's closer links to Intel over the course of 2012 [1] [2] it is safe to say that Cray is no longer the bankable customer for AMD that it was five years ago. My conversation with Bolding ended with what I can only say should be taken as a stark warning by AMD because it could be applied to any of the firm's large Opteron customers.

Bolding said, "We are getting more experience with Intel and we believe the Xeon line is greatly improved than what it was in previous generations. They [Intel] have had some hiccups along the way but we do believe Sandy Bridge and Ivy Bridge are world class performance for high performance computing. The decision was network based and flexibility based, we need to be able to choose the best processor in any generation."

It should be noted that Gopalakrishnan joined AMD earlier this year, long after design decisions for the firm's Piledriver Opteron had been set in stone but the case of PCI Express Gen 3 support highlights where the firm has been going wrong for so many years. Intel can afford to come late to the party with a 95 percent market share of the server market but AMD should be the early adopter of new technologies in a bid to increase its market share. Instead the firm seems to be mired in making poor design decisions, allowing its rivals the opportunity to not just steal a march on AMD but leave it for dead.