Baron Fung

At this year’s NVIDIA GTC, the narrative has moved decisively beyond the initial shift to accelerated computing. What stood out in 2026 is not just the continuation of that trend, but the expansion of AI infrastructure into a heterogeneous, domain-specific ecosystem.

As an analyst covering data center compute, the key takeaway is clear: the industry is entering its next phase—where optimization, not just scale, becomes the defining battleground.

From Retrieval to Generative—and Now to Reasoning Infrastructure

Hyperscaler workloads have evolved rapidly from retrieval-based systems toward generative AI, and now increasingly toward reasoning-driven architectures. Internal workloads such as search are being fundamentally re-architected around AI models, signaling a structural shift in how compute is deployed.

This transition continues to drive strong demand for accelerated computing. At Dell’Oro Group, we project global data center capex to exceed $1.7 T by 2030. These estimates could prove conservative given the scale of investment being signaled by hyperscalers, including multi-hundred-billion-dollar capex trajectories and long-term, large-scale infrastructure commitments.

The Emergence of LPUs: A Potential Inflection Point

LPUs, particularly through NVIDIA’s partnership with Groq, represent one of the more strategically important developments. Their SRAM-based architecture is optimized for low latency and strong performance per watt, enabling lower cost per token for inference and reasoning workloads.

This introduces greater flexibility in infrastructure design. Different service tiers can be optimized independently, with throughput-oriented configurations for lower-cost services and latency-sensitive deployments for premium offerings. LPUs provide a mechanism to fine-tune this balance in ways that GPUs alone cannot fully achieve.

Early deployments suggest LPUs can be configured at meaningful density. For example, a single Groq LPU rack can integrate hundreds of processors highlighting the degree of parallelism available for inference and reasoning workloads. In practice, such systems are likely to be deployed alongside GPU clusters, with the ratio depending on workload mix and service requirements.

If adoption reaches even modest levels, LPUs could expand the silicon TAM for domain-specific accelerators. At the same time, it remains unclear whether LPUs will primarily complement GPUs or displace portions of certain workloads as operators optimize for overall system efficiency. More broadly, LPUs underscore the growing importance of architectural specialization tailored to specific workload requirements.

GPU Roadmap: Density and Scale Continue to Accelerate

NVIDIA continues to push aggressively on GPU density and system integration. Platforms such as Vera Rubin Ultra demonstrate this trajectory, with multi-die architectures, massive HBM capacity reaching the terabyte scale per package, and highly dense, liquid-cooled rack designs.

Future platforms such as Feynman are expected to push these limits further, increasing both compute density and system complexity. However, this rapid scaling introduces new constraints around power, cooling, and system balance. As a result, complementary architectures and more specialized components will play a growing role in maintaining overall efficiency. With compute costs remaining elevated and data center capex scaling into the hundreds of billions annually, operators will need to strategically align infrastructure with domain-specific workloads to maximize efficiency and reduce total cost of ownership.

Interconnects: Balancing Standards and Proprietary Innovation

Interconnect strategy remains central to NVIDIA’s roadmap. The company continues to balance proprietary innovation with industry standards, investing in both InfiniBand and Ethernet for scale-out connectivity while advancing NVLink as the backbone of scale-up architectures.

As scale-up domains expand, NVLink will increasingly need to extend beyond the rack and, over time, into the optical domain. This evolution is necessary to support larger, more tightly coupled compute fabrics, but also introduces new technical challenges.

The expansion of scale-up capabilities naturally raises the question of whether they could displace portions of traditional scale-out networking. In practice, both architectures will need to evolve in parallel. Scale-up enables higher performance within tightly coupled systems, while scale-out remains essential for resilience, workload distribution, and efficient utilization across clusters. This is increasingly true not only for training but also for inference, where distributed workloads and service-level requirements demand flexibility.

NVIDIA is reducing reliance on PCIe-based x86 systems by positioning NVLink as an alternative interconnect. With initiatives such as NVLink Fusion and the development of its own CPU roadmap, the company is positioning NVLink as a broader system fabric that could extend beyond GPUs.

Connectivity, Networking, and System-Level Optimization

Connectivity is rapidly emerging as one of the primary constraints in next-generation AI infrastructure. Current systems are largely built on 200 Gbps SerDes, but the industry is already looking ahead to 400 Gbps SerDes. However, the transition to 400 Gbps presents significant challenges in signal integrity, power consumption, and packaging complexity, making the timeline aggressive and execution uncertain.

In this context, NVIDIA’s vertically integrated approach provides a meaningful advantage. Its control over InfiniBand technology, including SerDes development, allows it to move ahead of standard Ethernet ecosystems when necessary, particularly when industry standards lag behind system requirements.

At the same time, networking is no longer just about bandwidth. Smart NICs and DPUs, particularly NVIDIA’s BlueField platform, are becoming increasingly central to system architecture, with the market projected to grow at a 30% CAGR over the next five years. DPUs are expanding into broader roles within the AI infrastructure, managing data movement between compute, storage, and CPU domains while offloading networking and orchestration tasks from primary processors..

Taken together, these trends point toward a broader shift to system-level optimization, where performance is increasingly determined by how effectively compute, networking, and storage are integrated across the entire infrastructure stack.

Expanding the Platform: Beyond GPUs to Full-Stack Infrastructure

While GPUs remain the foundation of AI infrastructure, NVIDIA is clearly extending its reach across the full data center stack. Beyond its focus on domain-specific accelerators, GTC 2026 also highlighted the dense Vera CPU platform optimized for orchestrating agentic AI workloads, as well as the STX platform designed for KV cache-based context memory. A central theme underpinning this expansion is the increasing importance of co-design—bringing together compute, networking, and storage disciplines into a unified, system-level architecture rather than optimizing each component in isolation.

Taken together, these developments signal a clear expansion of NVIDIA’s total addressable market—from GPUs alone to a broader, full-stack infrastructure opportunity spanning compute, networking, and storage.

From Scale to Optimization: The Path Forward

NVIDIA’s rapid innovation cadence raises important questions around long-term economics, particularly as systems become more complex and capital-intensive. Maintaining a strong return on investment will depend not only on hardware performance, but on how effectively these systems can be utilized over time.

Here, NVIDIA’s software ecosystem remains a key advantage. CUDA provides continuity across generations, allowing developers to extract incremental performance improvements and enabling mixed-generation deployments that improve overall total cost of ownership.

More broadly, GTC 2026 makes it clear that the industry is moving beyond the initial phase of scaling AI infrastructure and into one defined by optimization and specialization. The shift toward heterogeneous architectures, combined with a growing focus on efficiency and workload-specific design, is reshaping how data centers are built and operated.

The hyperscale AI infrastructure buildout is entering a more mature phase. After several years of rapid regional expansion driven by resilience, redundancy, and data sovereignty, hyperscalers are now focused on scaling AI compute and supporting infrastructure efficiently. As we move into 2026, the cycle is increasingly defined by capex discipline and execution risk, even as absolute investment levels remain historically high.

Accelerated Servers Remain the Core Spending Driver

Spending on high-end accelerated servers rose sharply in 2025 and continues to anchor AI infrastructure investment heading into 2026. These platforms pull through demand for GPUs and custom accelerators, HBM, high-capacity SSDs, and high-speed NICs and networks used in large AI clusters. While frontier model training remains important, a growing share of deployments is now driven by inference workloads, as hyperscalers scale AI services to millions of users globally.

This shift meaningfully expands infrastructure requirements, as inference workloads require higher availability, geographic distribution, and tighter latency guarantees than centralized training clusters.

GPUs Continue to Dominate Component Revenue

High-end GPUs will remain the largest contributor to component market revenue growth in 2026, even as hyperscalers deploy more custom accelerators to optimize cost, power efficiency, and workload-specific performance at scale. NVIDIA is expected to begin shipping the Vera Rubin platform in 2H26, which increases system complexity through higher compute and networking density and optional Rubin CPX inference GPU configurations, materially boosting component attach rates.

AMD is positioning to gain share with its MI400 rack-scale platform, supported by recently announced wins at OpenAI and Oracle. Despite growing competition, GPUs continue to command outsized revenue due to higher ASPs, broader ecosystem support.

Near-Edge Infrastructure Becomes Critical for Inference

As AI inference demand accelerates, hyperscalers will need to increase investment in near-edge data centers to meet latency, reliability, and regulatory requirements. These facilities—located closer to population centers than centralized hyperscale regions—are essential for real-time, user-facing AI services such as copilots, search, recommendation engines, and enterprise applications.

Near-edge deployments typically favor smaller but highly dense accelerated clusters, with strong requirements for high-speed networking, local storage, and redundancy. While these sites do not approach the power scale of centralized AI campuses, their sheer number and geographic dispersion represent a meaningful incremental capex requirement heading into 2026. In contrast, far-edge deployments remain more use-case dependent and are unlikely to see material growth until ecosystems and application demand further mature.

Networking and CPUs Transition Unevenly

The x86 CPU and NIC markets tied to general-purpose servers are expected to decelerate in 2026 following short-term inventory digestion. In contrast, demand for high-speed networking remains tightly linked to accelerated compute growth. Even as inference workloads outpace training, inference accelerators continue to rely on scale-out fabrics to support utilization, redundancy, and ultra-low latency.

Supply Chains Tighten as Component Costs Rise

AI infrastructure supply chains are becoming increasingly constrained heading into 2026. Memory vendors are prioritizing production of higher-margin HBM, limiting capacity for conventional DRAM and NAND used in AI servers. As a result, memory and storage prices are rising sharply, increasing system-level costs for accelerated platforms.

Beyond memory, longer lead times for advanced substrates, optics, and high-speed networking components are adding further volatility to the supply chain. In parallel, tariff uncertainty and evolving trade policy introduce additional supply-chain risk, and potentially elevating component pricing over the medium term.

Capex Remains Elevated, but ROI Scrutiny Intensifies

The US hyperscale cloud service providers continue to raise capex guidance, reinforcing the continuity of the multi-year AI investment cycle into 2026. Accelerated computing, greenfield data center builds, near-edge expansion, and competitive pressures remain strong tailwinds. Changes in depreciation treatment provide levers to optimize cash flow and support near-term investment levels.

However, infrastructure investment has outpaced revenue growth, increasing scrutiny around capex intensity, depreciation, and long-term returns. While cash flow timing can be managed, underlying ROI depends on successful AI monetization, increasing the risk of margin pressure if revenue growth lags infrastructure deployment.

NVIDIA recently introduced fully integrated systems, such as the GB200/300 NVL72, which combine Blackwell GPUs with Grace ARM CPUs and leverage NVLink for high-performance interconnects. These platforms showcase what’s possible when the CPU–GPU connection evolves in lockstep with NVIDIA’s accelerated roadmap. As a result, ARM achieved a 25 percent revenue share of the server CPU market in 2Q25, with NVIDIA representing a significant portion due to strong adoption by major cloud service providers.

However, adoption of such proprietary systems may not reach its full potential in the broader enterprise market, as many customers prefer the flexibility of open ecosystems and established CPU vendors that the x86 architecture offers. Yet the performance of GPU-accelerated applications on x86 has long been constrained by the pace of the PCIe roadmap for both scale up and scale out connectivity. While GPUs continue to advance on an 18-month (or shorter) cycle, CPU-to-GPU communication over PCIe has progressed more slowly, often limiting system-level GPU connectivity.

The new Intel–NVIDIA partnership is designed to close this gap. With NVLink Fusion available on Intel’s x86 platforms, enterprises can scale GPU clusters on familiar infrastructure while benefiting from NVLink’s higher bandwidth and lower latency. In practice, this brings x86 systems much closer to the scalability of NVIDIA’s own NVL-based rack designs, without requiring customers to fully commit to a proprietary stack.

For Intel, the agreement ensures continued relevance in the AI infrastructure market despite the lack of a competitive GPU portfolio. For server OEMs, it opens up new design opportunities: they can pair a customized Intel x86 CPUs with NVIDIA GPUs in a wider range of configurations—creating more differentiated offerings from individual boards to full racks—while retaining flexibility for diverse workloads.

The beneficiaries of this development include:

NVIDIA, which extends NVLink adoption into the broader x86 ecosystem.
Intel, which can play a key role in the AI systems market despite lacking a competitive GPU portfolio, bolstered by NVIDIA’s $5 billion investment.
Server OEMs, which gain more freedom to innovate and differentiate x86 system designs.

At the same time, there are competitive implications:

AMD is unlikely to participate, as its CPUs compete with Intel and its GPUs compete with NVIDIA. The company continues to pursue its own interconnect strategy through UA Link.
ARM may see reduced momentum for external enterprise AI workloads if x86 platforms can now support higher GPU scalability. That said, cloud providers may continue to use ARM for internal workloads and could explore custom ARM CPUs with NVLink Fusion.

Ultimately, NVLink Fusion on Intel x86 platforms narrows the gap between systems based on a mainstream architecture and NVIDIA’s proprietary designs. It aligns x86 and GPU roadmaps more closely, giving enterprises a more scalable path forward while preserving choice across CPUs, GPUs, and system architectures.

NVIDIA’s Vision for the Future of AI Data Centers: Scaling Beyond Limits

At NVIDIA GTC, Jensen Huang’s keynote highlighted NVIDIA’s growing presence in the data center market, which is projected to surpass $1 trillion by 2028, in reference to Dell’Oro Group’s forecast. NVIDIA is no longer just a chip vendor; it has evolved into a provider of fully integrated, rack-scale solutions that encompass compute, networking, and thermal management. During GTC, NVIDIA also announced an AI Data Platform that integrates enterprise storage with NVIDIA accelerated computing to enable AI agents to provide real-time business insights to enterprise customers. This transformation is redefining how AI workloads are deployed at scale.

GTC2025 keynote Jensen Huang — Source: Nvidia GTC 2025

The Blackwell Platform: Optimized for AI Training and Reasoning

The emergence of NVIDIA’s Blackwell platform represents a major leap in AI acceleration. Not only does it excel at training deep learning models, but it is also optimized for inference and reasoning—two key drivers of hyperscale capital expenditure growth in 2025. Reasoning models, which generate a significant number of tokens, operate differently from conventional AI models. Unlike traditional AI that directly answers queries, reasoning models use “thinking tokens” to process and refine their responses, mimicking cognitive reasoning. This process significantly increases computational demands significantly.

The Evolution of Accelerated Computing

The unit of accelerated computing is evolving rapidly. It started with single accelerators, progressed to integrated servers like the NVIDIA DGX, and has now reached rack-scale solutions like the NVIDIA GB200 NVL72. Looking ahead, NVIDIA aims to scale even further with the upcoming Vera Rubin Ultra platform, featuring 572 GPUs interconnected in a rack. Scaling up AI clusters introduces new challenges in interconnects and power density. However, as compute nodes scale into the hundreds of thousands (and beyond), the industry needs to address several key challenge:

1) Increasing Rack Density

AI data centers aim to pack GPUs as closely as possible to create a coherent compute fabric for large language model (LLM) training and real-time inference. The NVL72 already features extremely high density, necessitating liquid cooling for heat dissipation. With further scaling, interconnect distances will increase. The question arises: will copper cabling remain viable, or will the industry need to transition to optical interconnects, despite their higher cost and power inefficiencies?

2)The Shift to Multi-Die GPUs

To boost computational capacity, increasing GPU die size has been one approach. However, with the Vera Rubin platform, GPUs have already reached the reticle limit, necessitating a shift to multi-die architectures. This will increase the physical footprint and interconnect distance, posing further engineering challenges.

3) Surging Rack Power Density

As GPU size and node count increase, rack power density is skyrocketing. NVIDIA’s GB200 NVL72 racks already consume 132 kW, and the upcoming Rubin Ultra NVL572 is projected to require 600 kW per rack. Given that AI data centers typically operate within a 50 MW range, fewer than 100 racks can be housed in a single facility. This constraint demands a new approach to scaling AI infrastructure.

4)Disaggregating AI Compute Across Data Centers

As power limitations become a bottleneck, AI clusters may need to be strategically distributed across multiple data centers based on power availability. This introduces the challenge of interconnecting these geographically dispersed clusters into a single virtual AI compute fabric. Coherent optics and photonics-based networking may be necessary to enable low-latency interconnects between data centers separated by miles. NVIDIA’s recently introduced silicon photonics switch may be part of this solution, at least from the standpoint of lowering power consumption, but additional innovations in data center interconnect architectures will likely be required to meet the demands of large-scale distributed AI workloads.

The Future of AI Data Centers

As NVIDIA continues to innovate, the next generation of AI data centers will need to embrace new networking technologies, reimagine power distribution, and pioneer novel solutions for high-density, high-performance computing. The future of AI isn’t just about more GPUs—it’s about building the infrastructure to support them at scale.

Related blog: Insights from GTC25: Networking Could Tip the Balance in the AI Race

The 2024 OCP Global Summit theme was “From Ideas to Impact,” but it could have been “AI Ideas to AI Impact.” Accelerated computing infrastructure was front and center starting with the keynote, on the exhibition hall floor, and in the breakout sessions. Hyperscalers and the ecosystem of suppliers that support them were eager to share what they’ve been working on to bring accelerated computing infrastructure and AI workloads to the market, at scale. As you may expect with anything AI related, it drew a crowd – Over 7000 attendees participated in the event in 2024, a significant increase from ~4500 last year. Throughout the crowds, sessions, and expo hall, three key themes stood out to me: Power and cooling designs for NVIDIA GB200 NVL Racks, an explosion of interest in liquid cooling, and sustainability’s presence among the AI backdrop.

Powering and Cooling NVIDIA GB200 NVL Racks

It’s well known that accelerated computing infrastructure significantly increases rack power densities. This has posed a significant challenge for traditional data center designs, where compute and physical infrastructure are developed and deployed in relative isolation. Deploying accelerated computing infrastructure has forced a rethink, where these boundaries are removed to create an optimized end-to-end system to support next generation “AI factories” at scale. The data center industry is acutely aware this applies to power and cooling, with notable announcements and OCP contributions from industry leaders in how they are addressing these challenges:

Meta kicked off the keynote by announcing Catalina, a rack-scale infrastructure design based on NVIDIA GB200 compute nodes. This design increased the power requirements from 12 – 18 kW/rack to 140 kW/system. To no surprise, Catalina utilizes liquid cooling.
NVIDIA contributed (open-sourced) elements of its GB200 NVL72 design, including a powerful 1400-amp bus bar for distributing power in the rack, and many liquid cooling contributions related to the manifold, blind mating, and flow rates. Lastly, NVIDIA recognized a new ecosystem of partners focused on the power and cooling infrastructure, highlighting Vertiv’s GB200 NVL72 reference architecture, which enables faster time to deployment, utilizes less space, and increases cooling energy efficiency.
Microsoft emphasized the need for liquid cooling for AI accelerators, noting retrofitting challenges in facilities without a chilled water loop. In response, they designed and contributed a custom liquid cooling heat exchanger, which leverages legacy air-based data center heat rejection. This is what I would refer to as air-assisted liquid cooling (AALC), more specifically, an air-assisted coolant distribution unit (CDU), which is becoming increasingly common in retrofitted accelerated computing deployments.
Microsoft also announced a collaborative power architecture effort with Meta, named Mt. Diablo based on a 400 Vdc disaggregated power rack, that will be contributed to the OCP soon. Google also highlighted the potential use of 400 Vdc for future accelerated computing infrastructure.

Data Center Liquid Cooling Takes Center Stage

Liquid cooling was among the most discussed topics at the summit, mentioned by nearly every keynote speaker in addition to dozens of breakout sessions dedicated to its growing use in compute, networking, and facility designs. This is justified from my perspective, as Dell’Oro Group previously highlighted liquid cooling as a technology going mainstream creating a $15 B market opportunity over the next five years. Furthermore, the ecosystem understands that not only is liquid cooling a growing market opportunity, but a critical technology to enable accelerated computing and the growth of AI workloads at scale.

There was not just liquid cooling talk, but partnerships and acquisitions leading up to and during the global summit that further cemented the critical role data center liquid cooling will play in industries’ future. This was highlighted in the following announcements:

Jabil acquired Mikros Technologies: Kicking off two weeks of big announcements, Jabil’s acquisition of Mikros brings together Mikros’s expertise in liquid cooling cold plate technology, engineering and design with Jabil’s manufacturing scale. This appears to position Mikros’s technology as a high-volume option for hyperscale end-users and the greater data center industry in the near future.
Jetcool announced facility CDU, Flex partnership: Jetcool, most known for their air-assisted liquid cooling infrastructure packaged in single servers, introduced a facility CDU (liquid-to-liquid) to keep pace with the market’s evolution towards purpose-built AI factories. The partnership brings together a technology specialist with a contract manufacturer to enable the coming scale needed to support hyperscale end-users and the greater data center industries’ liquid cooling needs.
Schneider Electric acquired Motivair: On the Summit’s final day, Schneider Electric announced its $1.13B acquisition of Motivair. This move, following prior partnerships and organic CDU developments, expands Schneider’s high-density cooling portfolio. This now gives Schneider a holistic power and cooling portfolio to support large-scale accelerated computing deployments, a capability previously exclusive to Vertiv, albeit at a high cost for Schneider.

Sustainability Takes a Back Seat but Is Still Very Much Part of the Conversation

While sustainability did not dominate the headlines, it remained a recurring theme throughout the summit. As AI growth drives massive infrastructure expansion, sustainability has become a critical consideration in data center designs. OCP’s CEO George Tchaparian characterized sustainability’s role alongside AI capex investments best, “Without sustainability, it’s not going to sustain.” Other highlights include:

OCP announced a new alliance with Net Zero Innovation Hub, an organization focused on net-zero data center innovation in Europe. Details on the alliance were sparse, but more details are expected to emerge on this partnership at the 2025 OCP EMEA Regional Summit.
Google shared a collaboration with Meta, Microsoft, and Amazon on green concrete. Most impressively, this collaboration began with a roadmap around the time of last year’s OCP Summit, which resulted in a proof-of-concept deployment in August 2024, reducing concrete emissions by ~40%.
A wide range of other sustainability topics were discussed. Improvements in cooling efficiency, water consumption, heat reuse, clean power, lifecycle assessment, and metrics to measure and track progress related to data center efficiency and sustainably were all prevalent.

Conclusion: Data Center Power and Cooling is Central to the Future of the Data Center Industry

The 2024 OCP Global Summit left me as confident as ever in the growing role data center power and cooling infrastructure has in the data center industry. It’s not only improvements to existing technologies but the adoption of new technologies and facility architectures that have emerged. The event’s theme, “From Ideas to Impact,” serves as a fitting reminder of how AI is reshaping the industry, with significant implications for the future. As we look ahead, the question isn’t just how data centers will power and cool AI workloads, but how they’ll do so sustainably, efficiently, and at an unprecedented scale.

Contact Us