[wp_tech_share]

Networking Is Becoming a Strategic Layer of AI

 

OpenAI, alongside a consortium of major technology players, including AMD, Broadcom, Intel, Microsoft and NVIDIA have introduced a new networking protocol designed to prevent congestion and hardware failures from disrupting large-scale AI clusters—underscoring how networking is becoming just as strategic as the compute itself.

Large-scale AI training depends on thousands of GPUs working together in tight synchronization. When one part of the network slows down or fails, the impact can ripple across the entire training job. Multipath Reliable Connection (MRC) focuses on reducing that risk by improving performance, resilience, and predictability across very large XPU clusters.

 

Ethernet Is Moving into the AI Supercomputer Core

One of the most important signals from this announcement is the continued shift from InfiniBand toward Ethernet-based AI networking. InfiniBand has played a major role in high-performance computing and AI clusters, but Ethernet is becoming increasingly attractive because of its scale, openness, broad supplier base, and operational familiarity.

Data Center Switch Revenue in Scale-out Back-end Networks

MRC extends RoCE, or RDMA over Converged Ethernet, and combines it with techniques such as multipath packet spraying and SRv6 source routing to make Ethernet more resilient for synchronous AI training workloads.

This does not mean InfiniBand disappears overnight. But it does show that Ethernet is rapidly evolving from a general-purpose data center technology into a serious foundation for the largest AI supercomputers. For the industry, that matters. A stronger Ethernet ecosystem could reduce dependency on a single networking approach, expand vendor participation, and give cloud providers and AI labs more flexibility in how they design infrastructure.

 

Open Standards Matter at AI Scale

The second major takeaway is the importance of openness and diversity. OpenAI’s decision to release the MRC specification through the Open Compute Project is significant because AI infrastructure is becoming too large and complex for closed, vertically isolated systems to scale efficiently.

Open standards can help align silicon vendors, cloud providers, system builders, and AI labs around common building blocks.

 

Diversity Is a Practical Requirement, Not Just a Principle

That diversity is not just philosophical. It is practical. The AI infrastructure market needs multiple suppliers for XPUs, NICs, switches, cloud platforms, and software layers. As demand for AI compute continues to rise, industry-wide collaboration can improve resilience, mitigate supply risk, reduce bottlenecks, and accelerate deployment.

 

From Specification to Real-World Deployment

The third major key takeaway is that MRC is not only a research concept; it is already being used in production. OpenAI says MRC is deployed across its largest NVIDIA GB200 supercomputers, including its site with Oracle Cloud Infrastructure in Abilene, Texas, and Microsoft’s Fairwater supercomputers. Both of these examples were deployed using NVIDIA’s SpectrumX switches.

More broadly, these deployments validate the accelerating shift toward Ethernet in large-scale AI clusters. According to Dell’Oro Group’s Data Center Switch—AI Back-end Networks report, NVIDIA and Celestica captured 50% of AI back-end networks in 2025. Arista ranked third, despite a significant portion of its AI-related product revenue being deferred.

Ethernet Data Center Switch Revenue Share in AI Back-end Networks 2025

 

The Bigger Industry Message

For the broader industry, the message is clear: AI infrastructure is entering a new phase. The question is no longer only who has the most XPUs, but who can connect them efficiently, operate them reliably, and keep them productive at massive scale.