Sameh Boujelbene, Author at Dell'Oro Group

A few months after Upscale AI introduced SkyHammer—its clean-slate, open-standards scale-up platform designed to make XPUs “behave like a single coherent machine”—the firm is now extending its vision for open AI networking infrastructure into the scale-out domain, where clusters expand horizontally across multiple racks and, increasingly, across multiple data centers. To that end, Upscale AI is announcing a strategic partnership with NVIDIA aimed at accelerating the deployment of open, scale-out AI networking infrastructure for next-generation data centers.

The collaboration brings together NVIDIA’s Spectrum-X Ethernet switch silicon and Upscale AI’s AI-optimized, SONiC-based networking software to deliver interoperable, high-performance Ethernet fabrics designed for large-scale AI workloads.

As enterprises and neocloud providers expand AI clusters, networking has emerged as a critical bottleneck. The partnership focuses on enabling these customers to deploy scalable, low-latency networking systems that support heterogeneous environments spanning compute, accelerators, memory, and storage.

Open Infrastructure for Heterogeneous AI Environments

As part of the initiative, Upscale AI has joined the NVIDIA Partner Network. The partnership is intended to give customers greater flexibility in how they design and procure AI infrastructure, including deploying Ethernet switching powered by NVIDIA Spectrum silicon in heterogeneous, multi-vendor environments. This collaboration reflects a step toward more interoperable Ethernet infrastructure for AI deployments, while maintaining operational consistency at scale.

Focus on AI-Optimized SONiC

A core element of Upscale AI’s approach is its AI-optimized implementation of SONiC, the open-source network operating system widely used in hyperscale environments.

At Dell’Oro Group, we expect SONiC adoption in AI back-end networks to accelerate much faster than what we have historically observed in front-end networks. This faster uptake will be driven by several tailwinds on both the demand as well as supply sides.

On the demand side, a growing number of fast-growing AI model builders and neocloud providers are evaluating SONiC to diversify vendors, reduce platform lock-in, and gain greater control over their network infrastructure. Vendor diversification also helps mitigate risk especially as supply availability tightens.

On the supply side, an expanding ecosystem of established vendors and new entrants is supporting the SONiC ecosystem. We expect SONiC-based switch sales in AI scale-out networks to grow at more than 50 % CAGR (2025-2030), exceeding $10 B by 2030.

Addressing a Critical Gap with Fully Integrated AI Infrastructure for Enterprise and Neocloud Customers

Historically, SONiC adoption has been spearheaded by hyperscalers. However, deploying and operating an open-source network operating system like SONiC demands substantial in-house engineering expertise and integration effort—capabilities many smaller cloud providers and enterprises lack. In addition, SONiC broader ecosystem support—such as turnkey distributions, enterprise-grade tooling, and vendor-backed support—has lagged proprietary network operating systems offerings, limiting SONiC adoption beyond hyperscale environments.

Upscale AI plans to bridge this gap by delivering fully integrated solutions that combine hardware, software, and lifecycle services targeted at organizations building medium and large-scale AI environments.

While the first wave of AI has been driven primarily by large AI model builders—namely hyperscalers—the second wave is expected to be led by other cloud providers, including neocloud providers, as well as large enterprises. Together, these customer segments are projected to account for the majority of the Ethernet data center switch sales in scale-out networks by 2030.

Stitching Together an Open Fabric for AI

SkyHammer was step one. Scale-out is step two. Upscale AI is stitching together an open networking story—from the scale-up interconnect that makes XPUs act like one system, to the Ethernet fabric that lets AI environments grow horizontally while preserving multi-vendor flexibility. The NVIDIA partnership helps validate that direction and accelerates the scale-out side of the roadmap, reinforcing Upscale AI’s broader goal: open, interoperable AI networking infrastructure from pod to cluster.

As 2025 comes to a close, we reflect on several remarkable milestones achieved by the data center switching market this year, and what 2026 may have in store for us.

Looking back at 2025, several clear inflection points reshaped the market:

Ethernet overtakes InfiniBand in AI back-end networking: Supported by strong tailwinds on both the supply and demand sides, 2025 marked a decisive turning point for AI back-end networks, as Ethernet surpassed InfiniBand in market adoption. This shift is particularly striking given that just two years ago, InfiniBand accounted for nearly 80% of the data center switch sales in AI back-end networks.

Overall Ethernet Data Center Switch sales nearly doubled compared with 2022: The rapid adoption of Ethernet in AI back-end deployments propelled total Ethernet data center switch sales to an all-time high in 2025, nearly doubling annual revenues compared with 2022 levels.

800 Gbps well surpassed 20 M ports within just three years of shipments: As a point of reference, it took 400 Gbps six to seven years to achieve the same milestone
The vendor landscape shifted meaningfully toward AI-exposed players: Vendors with greater exposure to AI back-end networking significantly outperformed the broader market in 2025. Companies such as Accton, Celestica and NVIDIA were among the primary beneficiaries of this shift, reflecting how AI-driven demand is reshaping competitive dynamics. Arista maintained the leading position in the Total Ethernet Data Center Switching market.

Looking ahead to 2026, questions are emerging around whether the pace of investment can be sustained after such an extraordinary year. While skepticism around AI returns on investment is growing, we believe the industry is still in the early innings of a multi-year AI investment cycle. Based on the latest capital expenditure outlooks from the large hyperscalers (Google, Amazon, Microsoft, Meta, Oracle and others), we expect another strong year of AI-related investment in 2026, which should continue to drive robust spending across the networking portion of the infrastructure stack.

Networking is becoming increasingly critical, as it plays a central role in addressing some of the most challenging scaling bottlenecks in AI deployments—including power availability and compute demand. Below are some of the inflection points expected for 2026:

Demand remains exceptionally strong in AI back-end networking. We continue to expect strong double-digit growth in AI networking spending, driven by ongoing scale-out of AI clusters. The integration of co-packaged optics could further accelerate market growth as optics would easily add multi billions to the market size.
Supply constraints remain the primary risk to our forecast. We expect demand to continue to outpace supply, with shortages in chips, memory, and other critical components representing the main caveats to our outlook. As a result, the market remains supply-constrained rather than demand-constrained—a challenging dynamic, but ultimately a more favorable one than the reverse.
Scale-up emerges as a new battlefield for Ethernet. After securing a leading position in the scale-out segment of AI back-end networks, Ethernet is now expanding into scale-up, where NVLink has historically dominated. In this space, Ethernet will compete not only with NVLink but also with UALink, another alternative to NVLink. We anticipate 2026 will be a year full of vendor announcements targeting both Ethernet and UALink opportunities in scale-up. Scale-up represents what could be the largest total addressable market expansion the industry has ever seen.
1.6 Tbps switches expected to ship in volume in 2026. 2026 will mark the first year of volume deployments of 1.6 Tbps switches, driven by the insatiable demand for high bandwidth in AI clusters. 1.6 Tbps ramp is expected to be even faster than 800 Gbps, surpassing 5 M ports within one to two years of shipments.
Co-packaged optics (CPO) expected to ramp on both InfiniBand and Ethernet switches. After many years of development and debate, 2026 is expected to see the initial volume ramp of CPO on both InfiniBand and Ethernet switches. On the demand side, major hyperscalers are actively trialing the technology. On the supply side, while NVIDIA is leading the way, we expect other vendors to follow shortly.
Vendor diversity set to increase in 2026. As AI clusters continue to scale, vendor diversity with both incumbent vendors as well as new entrants, will become increasingly important to ensure risk mitigation and supply availability. We believe that no single vendor can meet the full demand for AI infrastructure. As a result, we expect SONiC adoption to accelerate in both scale-up and scale-out deployments, as it will be critical in enabling this broader vendor ecosystem

In summary, as we look ahead to 2026, the AI-driven data center landscape is set to continue its rapid evolution. From Ethernet’s rise in AI back-end networks and the emergence of scale-up as a new battlefield, to the adoption of 1.6 Tbps switches, co-packaged optics, and a more diverse vendor ecosystem, the infrastructure supporting AI is expanding in both scale and complexity. While supply constraints and ROI questions remain challenges, the industry is clearly in the early innings of a multi-year AI journey. Networking, in particular, will play a pivotal role in enabling the next phase of AI growth, making 2026 an exciting year for both innovation and investment.

Across hyperscalers and sovereign clouds alike, the race is shifting from just model supremacy to infrastructure supremacy. The real differentiation is now in how efficiently GPUs can be interconnected and utilized. As AI clusters scale beyond anything traditional data center networking was built for, the question is no longer how fast can you train? but can your network keep up? This is where emerging architectures like Optical Circuit Switches (OCS) and Optical Cross-Connects (OXC), a technology used in wide area networks for decades, enter the conversation.

The Network is the Computer for AI Clusters

The new age of AI reasoning is ushering in three new scaling laws—spanning pre-training, post-training, and test-time scaling—that together are driving an unprecedented surge in compute requirements. At GTC 2025, Jensen Huang stated that demand for compute is now 100× higher than what was predicted just a year ago. As a result, the size of AI clusters is exploding, even as the industry aggressively pursues efficiency breakthroughs—what many now refer to as the “DeepSeek moment” of AI deployment optimization.

As the chart illustrates, AI clusters are rapidly scaling from hundreds of thousands of GPUs to millions of GPUs. Over the next five years, the expectation is that there will be about 124 gigawatts of capacity to be brought online, or an equivalent of more than 70 million GPUs to be deployed. In this reality, the network will play a key role in connecting those GPUs in the most optimized, efficient way. The network is the computer for AI clusters.

Challenges in Operating Large Scale AI Clusters

As shown in the chart above, the number of interconnects scales exponentially with the number of GPUs. This rapid increase drives significant cost, power consumption, and latency. It is not just the number of interconnects that is exploding—the speed requirements are rising just as aggressively. AI clusters are fundamentally network-bound, which means the network must operate at nearly 100 percent efficiency to fully utilize the extremely expensive GPU resources.

Another major factor is the refresh cadence. AI back-end networks are refreshed roughly every two years or less, compared to about five years in traditional front-end enterprise environments. As a result, speed transitions in AI data centers are happening at almost twice the pace of non-accelerated infrastructure.

Looking at switch port shipments in AI clusters, we expect the majority of ports in 2025 will be 800 Gbps. By 2027, the majority will have transitioned to 1.6 Tbps, and by 2030, most ports are expected to operate at 3.2 Tbps. This progression implies that the data center network’s electrical layer will need to be replaced at each new bandwidth generation—a far more aggressive upgrade cycle than what the industry has historically seen in front-end, non-accelerated infrastructure.

The Potential Role of OCS in AI Clusters

Optical Circuit Switches (OCS) or Optical Cross-Connects (OXC) are network devices that establish direct, light-based optical paths between endpoints, bypassing the traditional packet-switched routing pipeline to deliver near-zero-latency connectivity with massive bandwidth efficiency. Google was the first major hyperscaler to deploy OCS at scale nearly a decade ago, using it to dynamically rewire its data center topology in response to shifting workload patterns and to reduce reliance on power-hungry electrical Ethernet fabrics.

A major advantage of OCS is that it is fundamentally speed-agnostic—because it operates entirely in the optical domain, it does not need to be upgraded each time the industry transitions from 400 Gbps to 800 Gbps to 1.6 Tbps or beyond. This stands in stark contrast to traditional electrical switching layers, which require constant refreshes as link speeds accelerate. OCS also eliminates the need for optical-electrical-optical (O-E-O) conversion, enabling pure optical forwarding, that not only reduces latency but also dramatically lowers power consumption by avoiding the energy cost of repeatedly converting photons to electrons and back again.

The combined benefit is a scalable, future-proof, ultra-efficient interconnect fabric that is uniquely suited for AI and high-performance computing (HPC) back-end networks, where east-west traffic is unpredictable and bandwidth demand grows faster than Moore’s Law. As AI workload intensity surges, OCS is being explored as a way to optimize the network.

OCS is a Proven Technology

Using an OCS in a network is not new. It was, however, called by different names over the past three decades: OOO Switch, all-optical switch, optical switch, and optical cross-connect (OXC). Currently, the most popular term for these systems used in data centers is OCS.

It has been used in the wide area network (WAN) for many years to solve a similar problem set. And for many of the same reasons, tier-one operators worldwide have addressed it through the strategic use of OCSs. Hence, OCSs have been used in carrier networks by operators with the strictest performance and reliability requirements for over a decade. Additionally, the base optical technologies, both MEMS and LCOS, have been widely deployed in carrier networks and have operated without fault for even longer. Stated another way, OCS is based on field-proven technology.

Whether used in a data center or to scale across data centers, an OCS offers several benefits that translate into lower costs over time.

To address the specific needs for AI data centers, companies have launched new OCS products. The following is a list of the products available in the market:

Final Thought

AI infrastructure is diverging from conventional data center design at an unprecedented pace, and the networks connecting GPUs must evolve even faster than the GPUs themselves. OCS is not an exotic research architecture; it is a proven technology that is ready to be explored and considered for use in AI networks as a way to differentiate and evolve them to meet the stringent requirements of large AI clusters.

With a wave of announcements coming out of GTC, countless articles and blogs have already covered the biggest highlights. Rather than simply rehashing the news, I want to take a different approach—analyzing what stood out to me from a networking perspective. As someone who closely tracks the market, it’s clear that AI workloads are driving a steep disruption in networking infrastructure. While a number of announcements at GTC25 were compute related, NVIDIA made it clear that implementations of next generation GPUs and accelerators wouldn’t be made possible without major innovations on the networking side.

1) The New Age of AI Reasoning Driving 100X More Compute Than a Year Ago

Jensen highlighted how the new era of AI reasoning is driving the evolution of scaling laws, transitioning from pre-training to post-training and test-training. This shift demands an enormous increase in compute power to process data efficiently. At GTC 2025, he emphasized that the required compute capacity is now estimated to be 100 times greater than what was anticipated just a year ago.

2) The Network Defines the AI Data Center

The way AI compute nodes are connected will have profound implications on efficiency, cost, and performance. Scaling up, rather than scaling out, offers the lowest latency, cost, and power consumption when connecting accelerated nodes in the same compute fabric. At GTC 2025, NVIDIA unveiled plans for its upcoming NVLink 6/7 and NVSwitch 6/7, key components of its next-generation Rubin platform, reinforcing the critical role of NVLink switches in its strategy. Additionally, the Spectrum-X switch platform, designed for scaling out, represents another major pillar of NVIDIA’s vision (Chart). NVIDIA is committed to a “one year-rhythm”, with networking keeping pace with GPU requirements. Other key details from NVIDIA’s roadmap announcement also caught our attention, and we are excited to share these with our clients.

3) Power Is the New Currency

The industry is more power-constrained than ever. NVIDIA’s next-generation Rubin Ultra is designed to accommodate 576 dies in a single rack, consuming 600 kW—a significant jump from the current Blackwell rack, which already requires liquid cooling and consumes between 60 kW and 120 kW. Additionally, as we approach 1 million GPUs per cluster, power constraints are forcing these clusters to become highly distributed. This shift is driving an explosion in the number of optical interconnects, both intra- and inter-data center, which will exacerbate the power challenge. NVIDIA is tackling these power challenges on multiple fronts, as explained below.

4) Liquid-Cooled Switches Will Become a Necessity, Not a Choice

After liquid cooling racks and servers, switches are next. NVIDIA’s latest 51.2 T SpectrumX switches offer both liquid-cooled and air-cooled options. However, all future 102.4 T Spectrum-X switches will be liquid-cooled by default.

5) Co-packaged Optics (CPO) in Networking Chips Before GPUs

Another key reason for liquid cooling racks is to maximize the number of GPUs within a single rack while leveraging copper for short-distance connectivity—”Copper when you can, optics when you must.” When optics are necessary, NVIDIA has found a way to save power with Co-Packaged Optics (CPO). NVIDIA plans to make CPO available on its InfiniBand Quantum switches in 2H25 and on its Spectrum-X switches in 2H26. However, NVIDIA will continue to support pluggable optics across different SKUs, reinforcing our view that data centers will adopt a hybrid approach to balance performance, efficiency, and flexibility.

6) Impact on Ethernet Switch Vendor Landscape

According to our AI Networks for AI Workloads report, three major vendors dominated the Ethernet portion of the AI Network market in 2024.

However, over the next few years, we anticipate greater vendor diversity at both the chip and system levels. We anticipate that photonic integration in switches will introduce a new dimension, potentially reshaping the dynamics of an already vibrant vendor landscape. We foresee a rapid pace of innovation in the coming years—not just in technology, but at the business model level as well.

Networking could be the key factor that shifts the balance of power in the AI race and customers appetite for innovation and cutting-edge technologies is at an unprecedented level. As one hyperscaler put it during a panel at GTC 2025: “AI infrastructure is not for the faint of heart.”

For more detailed views and insights on the AI Networks for AI Workloads report, please contact us at dgsales@delloro.com.

Significant Share Shifts Expected in 2025 as Ethernet Gains Momentum in AI Back-end Networks

The networking industry is experiencing a dramatic shift, driven by the rise of AI workloads and the need for new AI back-end networks to connect an ever-increasing number of accelerators in large AI clusters. While investments in AI back-end networks are reaching unprecedented levels, traditional front-end networks needed to connect general-purpose servers remain essential.

At Dell’Oro Group, we’ve just updated our five-year forecast reports for both the front-end as well as the back-end and we’re still bullish on both. Below are some key takeaways:

AI Back-End Network Spending Set to Surpass $100B through 2029 with Ethernet Gaining Momentum

Despite growing concerns about the sustainability of spending on accelerated infrastructure—especially in light of DeepSeek’s recent open-source model, which requires significantly fewer resources than its U.S. counterparts—we remain optimistic. Recent data center capex announcements by Google, Amazon, Microsoft, and Meta in their January/February earnings calls showed ongoing commitment to a sustained high level of AI infrastructure capex supports that view.

We have again raised our forecast for data center switch sales sold in AI Back-end networks with our January 2025 report. However, not all technologies are benefiting equally.

Ethernet is experiencing significant momentum, propelled by supply and demand factors. More large-scale AI clusters are now adopting Ethernet as their primary networking fabric. One of the most striking examples is xAI’s Colossus, a massive NVIDIA GPU-based cluster that has opted for Ethernet deployment.

We therefore revised our projections, moving up the anticipated crossover point where Ethernet surpasses InfiniBand to 2027.

Major share shifts anticipated for Ethernet AI Back-end Networks in 2025

While Celestica, Huawei, and NVIDIA dominated the Ethernet segment in 2024, the competitive landscape is set to evolve in 2025, with Accton, Arista, Cisco, Juniper, Nokia, and other vendors expected to gain ground. We expect the vendor landscape in AI Back-end networks to remain very dynamic as Cloud SPs hedge their bets by diversifying their supply on both the compute side and the networking that goes with it.

Strong Rebound in Front-end Networks Spending in 2025 and Beyond

Despite the challenges in 2024, we expect growth in the front-end market to resume in 2025 and beyond, driven by several factors. These include the need to build additional capacity in front-end networks to support back-end deployments, especially for greenfield projects. These additional front-end network connectivity deployments are expected to include high speeds (>100 Gbps), driving a price premium. Sales growth will be further stimulated by inferencing applications that may not require accelerated servers and will instead operate in front-end networks, whether at centralized locations or edge sites.

The Road Ahead

As AI workloads expand and diversify, the networking infrastructure that supports them—in both front-end and back-end must evolve accordingly. The transition to higher-speed Ethernet and the shifting competitive landscape among vendors suggest that 2025 could be a pivotal year for Ethernet data center switching market.

For more detailed views and insights on the Ethernet Switch—Data Center report or the AI Networks for AI Workloads report, please contact us at dgsales@delloro.com.

Contact Us