[wp_tech_share]

With around 40 vendors rushing into coolant distribution units, liquid cooling is surging—but how many players can the market sustain?

The AI supercycle is not just accelerating compute demand—it’s transforming how we power and cool data centers. Modern AI accelerators have outgrown the limits of air cooling. The latest chips on the market—whether from NVIDIA, AMD, Google, Amazon, Cerebras, or Groq—all share one design assumption: they are built for liquid cooling. This shift has catalyzed a market transformation, unlocking new opportunities across the physical infrastructure stack.

While the concept of liquid cooling is not new—IBM was water-cooling System/360 mainframes in the 1960s—it is only now, in the era of hyperscale AI, that the technology is going truly mainstream. According to Dell’Oro Group’s latest research, the Data Center Direct Liquid Cooling (DLC) market surged 156 percent year-over-year in 2Q 2025 and is projected to reach close to $6 billion by 2029, fueled by the relentless growth of accelerated computing workloads.

As with any fast-growing market, this surge is attracting a flood of new entrants, each aiming to capture a piece of the action. Oil majors are introducing specialized cooling fluids, and thermal specialists from the PC gaming world are pivoting into cold plate solutions. But one product category in particular has become a hotbed of competition: coolant distribution units (CDUs).

 

What’s a CDU and Why Does It Matter?

CDUs act as the hydraulic heart of many liquid cooling systems.

Sitting between facility water and the cold plates embedded in IT systems, these units regulate flow, pressure, and temperature, while providing isolation, monitoring, and often redundancy.

As direct-to-chip liquid cooling becomes a design default for high-density racks, the CDU becomes a mission-critical mainstay for modern data centers.

 

At Dell’Oro, we have been tracking this market from its early stages, anticipating the shift of liquid cooling from niche to necessity. Our ongoing research has already identified around 40 companies with CDUs within their product portfolios, ranging from global powerhouses to nimble specialists. The sheer number of players raises an important question: is the CDU market becoming overcrowded?

 

Who is currently in the CDU market?

The CDU market is being shaped by players from a wide variety of backgrounds. Some excel in rack system integration, others in high-performance engineering, and others in manufacturing and scalability prowess. The variety of approaches reflects the diversity of the players themselves—each entering the market from a different starting point, with distinct technical DNA and go-to-market strategies.

Below is a snippet of our CDU supplier map—only a sample of our research to be featured in Dell’Oro’s upcoming Data Center Liquid Cooling Advanced Research Report, expected to be published in 4Q 2025. Our list of CDU vendors is constantly refreshed—it has only been three weeks since the latest launch by a major player, with Johnson Controls announcing its new Silent-Aire series of CDUs.

Not all companies in this list have arrived here organically. The momentum in the CDU market has also fueled a wave of M&A and strategic partnerships. Unsurprisingly, the largest moves have been led by physical infrastructure giants eager to secure a position, as was the case with Vertiv’s acquisition of CoolTera in December 2023 and Schneider Electric’s purchase of Motivair in October 2024.

Beyond these headline deals, several diversified players have taken stakes in thermal specialists—for example, Samsung’s acquisition of FläktGroup and Carrier’s investment in two-phase specialist Zutacore. Private equity has also entered the fray, most notably with KKR’s acquisition of CoolIT. Together, these moves underscore the growing strategic importance of CDU capabilities, even if not every partnership is directly tied to them.

 

Who will win in the CDU market?

Our growth projections are robust, and there is room for multiple vendors to thrive. In the short to medium term, we still expect to see new entrants. Innovators are likely to emerge, developing technologies to address the relentless thermal demands of AI workloads, while nimble players will be quick to capture share in underserved geographies and verticals. Established names such as Vertiv, CoolIT, or Boyd will need to maintain their edge as data center designs and market dynamics evolve.

By the end of the decade, we expect the supply landscape to consolidate as the market matures and capital shifts toward other growth segments. Consolidation and exits are inevitable. We expect fewer than 10 vendors to ultimately capture the lion’s share of the market, with the remainder assessing the minimum scale neede d to operate sustainably while meeting shareholder expectations—or exiting altogether.

Who will win? There is no single path to success, as data center operators and their applications remain highly diverse. For instance, some had forecast the demise of the in-rack CDU as a subscale solution misaligned with soaring system capacity requirements. Many operators, however, continue to find value in this form factor. Slightly lower partial power usage effectiveness (pPUE) can be offset by advantages in modularity, ease of off-site rack integration and commissioning, and containment of faults and leaks.

Similarly, liquid-to-air (L2A) systems were often described as a transitional technology destined to be quickly superseded by more efficient liquid-to-liquid (L2L) solutions. Yet L2A CDUs have maintained a role even with large operators—ideal for retrofit projects in sites heavily constrained by legacy design choices, with accelerated computing racks operating alongside conventional workloads.

In-rack CDUs, L2A solutions, and other design variations will continue to play a role in a market that is rapidly evolving. GPU requirements are rising year after year, and liquid cooling systems are advancing in step with the capacity demands of next-generation AI clusters. Amid this market flux, several factors are emerging as critical for success.

First, CDUs are not standalone equipment: they are an integral element of a cooling system. Successful vendors take a system-level approach, anticipating challenges across the deployment and leveraging the CDU as hardware tightly integrated with multiple elements to ensure seamless operation. Vendors with proven track records and large installed bases—spanning multiple gigawatts—enjoy an advantage in this regard, as their experience positions them to function as a partner and advisor to their customers, rather than a mere vendor.

Second, success is not just about having the right product—it is about understanding the problem the customer needs solved and developing suitable solutions. Operators face diverse challenges, and a single fleet may need everything from small in-rack CDUs to customized L2A units or even fully skidded multi-megawatt systems. Breadth of portfolio helps hedge across deployment types, but it is not the only path to success. Vendors with a sharp edge in specific technologies can also capture meaningful share.

Lastly, scale and availability are often decisive. As builders race to deliver more compute capacity, short equipment lead times can create opportunities for nimble challengers. Availability goes beyond hardware—it also requires skilled teams to design, commission, and maintain CDUs across global sites, including remote locations outside traditional data center hubs.

As the market evolves, one key question looms: which vendors will adapt and emerge as leaders in this critical segment of the AI infrastructure stack? The answer will shape not just the CDU landscape, but the broader liquid cooling market. We will be following this closely in Dell’Oro’s upcoming Data Center Liquid Cooling Advanced Research Report, expected in 4Q 2025, in which we provide deeper analysis into these dynamics and the broader liquid cooling ecosystem.

Read articles Alex Cordovil has written or been featured in...

[wp_tech_share]

NVIDIA recently introduced fully integrated systems, such as the GB200/300 NVL72, which combine Blackwell GPUs with Grace ARM CPUs and leverage NVLink for high-performance interconnects. These platforms showcase what’s possible when the CPU–GPU connection evolves in lockstep with NVIDIA’s accelerated roadmap. As a result, ARM achieved a 25 percent revenue share of the server CPU market in 2Q25, with NVIDIA representing a significant portion due to strong adoption by major cloud service providers.

However, adoption of such proprietary systems may not reach its full potential in the broader enterprise market, as many customers prefer the flexibility of open ecosystems and established CPU vendors that the x86 architecture offers. Yet the performance of GPU-accelerated applications on x86 has long been constrained by the pace of the PCIe roadmap for both scale up and scale out connectivity. While GPUs continue to advance on an 18-month (or shorter) cycle, CPU-to-GPU communication over PCIe has progressed more slowly, often limiting system-level GPU connectivity.

The new Intel–NVIDIA partnership is designed to close this gap. With NVLink Fusion available on Intel’s x86 platforms, enterprises can scale GPU clusters on familiar infrastructure while benefiting from NVLink’s higher bandwidth and lower latency. In practice, this brings x86 systems much closer to the scalability of NVIDIA’s own NVL-based rack designs, without requiring customers to fully commit to a proprietary stack.

For Intel, the agreement ensures continued relevance in the AI infrastructure market despite the lack of a competitive GPU portfolio. For server OEMs, it opens up new design opportunities: they can pair a customized Intel x86 CPUs with NVIDIA GPUs in a wider range of configurations—creating more differentiated offerings from individual boards to full racks—while retaining flexibility for diverse workloads.

The beneficiaries of this development include:
  • NVIDIA, which extends NVLink adoption into the broader x86 ecosystem.
  • Intel, which can play a key role in the AI systems market despite lacking a competitive GPU portfolio, bolstered by NVIDIA’s $5 billion investment.
  • Server OEMs, which gain more freedom to innovate and differentiate x86 system designs.
At the same time, there are competitive implications:
  • AMD is unlikely to participate, as its CPUs compete with Intel and its GPUs compete with NVIDIA. The company continues to pursue its own interconnect strategy through UA Link.
  • ARM may see reduced momentum for external enterprise AI workloads if x86 platforms can now support higher GPU scalability. That said, cloud providers may continue to use ARM for internal workloads and could explore custom ARM CPUs with NVLink Fusion.

Ultimately, NVLink Fusion on Intel x86 platforms narrows the gap between systems based on a mainstream architecture and NVIDIA’s proprietary designs. It aligns x86 and GPU roadmaps more closely, giving enterprises a more scalable path forward while preserving choice across CPUs, GPUs, and system architectures.

[wp_tech_share]

AWS’s In-Row Heat Exchanger (IRHX) is a custom-built liquid cooling system designed for its most powerful AI servers—system that initially spooked infrastructure investors, but may ultimately strengthen the vendor ecosystem.

On July 9, 2025, Amazon Web Services (AWS) unveiled its in-house-engineered IRHX, a rack-level liquid-cooling platform engineered to support AWS’s highest-density AI training and inference instances built around NVIDIA’s Blackwell GPUs.

The IRHX comprises three building blocks—a water‑distribution cabinet, an integrated pumping unit, and in‑row fan‑coil modules. In industry shorthand, this configuration is a coolant distribution unit (CDU) flanked by liquid‑to‑air (L2A) sidecars. Direct liquid cooling (DLC) cold‑plates draw heat directly from the chips; the warmed coolant then flows through the coils of heat exchangers, where high‑velocity fans discharge the heat into the hot‑aisle containment before the loop recirculates.

 IRHX: the AWS cooling solution supporting its NVIDIA Blackwell server deployments
IRHX: the AWS cooling solution supporting its NVIDIA Blackwell server deployments (Source: YouTube, https://youtu.be/u81NapG8yL0)

 

Data Center Physical Infrastructure vendors have offered DLC solutions with L2A sidecars for some time now, as a practical retrofit path for operators looking to deploy high-density racks in existing air-cooled environments with minimal disruption. Vertiv offers the CoolChip CDU 70, CoolIT provides a comprehensive line of AHx CDUs, Motivair brands its solution as the Heat Dissipation Unit (HDU™), Boyd sells in-row and in-rack L2A CDUs, and Delta also markets its own L2A options—just to name a few.

“When we looked at liquid cooling solutions based on what’s available in the market today, there were a few trade-offs,” says Dave Brown, VP of Compute & ML Services at AWS. “Between long lead times for building greenfield sites and scalability issues with off-the-shelf solutions, AWS chose to develop its own system. “The IRHX was designed to allow us to scale fast by standardizing our equipment and supply chain, and has been built to spec for our standard rack dimensions to fit within our existing data centers.” (Source: YouTube, https://youtu.be/u81NapG8yL0)

While the AWS approach resembles other industry solutions, it introduces several thoughtful innovations laser-focused on its specific needs. Most off-the-shelf systems integrate the pump and heat exchanger coil in a single enclosure, delivering a self-contained L2A unit for one or more racks. In contrast, AWS separates the pumping unit from the fan-coil modules, allowing a single pumping system to support a large number of fan units. These modular fans can be added or removed as cooling requirements evolve, giving AWS flexibility to right-size the system per row and site.

AWS is no stranger to building its own infrastructure solutions. From custom server boards and silicon (Trainium, Inferentia) to rack architectures and networking gear, the scale of its operations justifies a highly vertical approach. The IRHX follows this same pattern: by tailoring CDU capacity and L2A module dimensions to its own rack and row standards, AWS ensures optimal fit, performance, and deployment speed. In this context, developing a proprietary cooling system isn’t just a strategic advantage—it’s a natural extension of AWS’s vertically integrated infrastructure stack.

Market Implications

What does this mean for Data Center Physical Infrastructure vendors? The market reaction was swift—shares of Vertiv (NYSE: VRT) and Munters (STO: MTRS) dropped the day following AWS’s announcement. We do not view Amazon’s move as a threat to vendors in this space, however.

First, it’s important to recognize that the Liquid Cooling market remains buoyant, with considerable room for growth across the ecosystem. Dell’Oro Group’s latest research showed 144% year-over-year growth in 1Q 2025, and our forecast for the liquid cooling segment remains strong. As long as the AI supercycle continues—and we see little risk of it slowing down—the market is expected to remain healthy.

Second, while AWS is an engineering powerhouse, it rarely develops these solutions in isolation. It typically partners with established vendors to co-design its proprietary systems, which are also manufactured by third parties. IRHX may carry the AWS name, but it is likely being built in the facilities of well-known cooling equipment suppliers. Rather than displacing revenue from infrastructure vendors, the IRHX is expected to reinforce it—these vendors are likely playing a key role in its production, and their topline performance should benefit as a result.

Finally, although Dave Brown has stated that the IRHX “can be deployed in existing data centers as well as new builds,” we don’t expect to see it widely used in greenfield facilities designed from the ground up for artificial intelligence. L2A solutions are ideal for retrofitting sites with available floor space and cooling capacity, offering minimal disruption and lower upfront capex. They remain less efficient than liquid-to-liquid (L2L) systems, however, which are likely to stay the architecture of choice in purpose-built AI campuses designed for maximum thermal efficiency and scale.

 

Read articles Alex Cordovil has written or been featured in...

[wp_tech_share]
NVIDIA’s Vision for the Future of AI Data Centers: Scaling Beyond Limits

At NVIDIA GTC, Jensen Huang’s keynote highlighted NVIDIA’s growing presence in the data center market, which is projected to surpass $1 trillion by 2028, in reference to Dell’Oro Group’s forecast. NVIDIA is no longer just a chip vendor; it has evolved into a provider of fully integrated, rack-scale solutions that encompass compute, networking, and thermal management. During GTC, NVIDIA also announced an AI Data Platform that integrates enterprise storage with NVIDIA accelerated computing to enable AI agents to provide real-time business insights to enterprise customers. This transformation is redefining how AI workloads are deployed at scale.

GTC2025 keynote Jensen Huang
Source: Nvidia GTC 2025

The Blackwell Platform: Optimized for AI Training and Reasoning

The emergence of NVIDIA’s Blackwell platform represents a major leap in AI acceleration. Not only does it excel at training deep learning models, but it is also optimized for inference and reasoning—two key drivers of hyperscale capital expenditure growth in 2025. Reasoning models, which generate a significant number of tokens, operate differently from conventional AI models. Unlike traditional AI that directly answers queries, reasoning models use “thinking tokens” to process and refine their responses, mimicking cognitive reasoning. This process significantly increases computational demands significantly.

The Evolution of Accelerated Computing

The unit of accelerated computing is evolving rapidly. It started with single accelerators, progressed to integrated servers like the NVIDIA DGX, and has now reached rack-scale solutions like the NVIDIA GB200 NVL72. Looking ahead, NVIDIA aims to scale even further with the upcoming Vera Rubin Ultra platform, featuring 572 GPUs interconnected in a rack. Scaling up AI clusters introduces new challenges in interconnects and power density. However, as compute nodes scale into the hundreds of thousands (and beyond), the industry needs to address several key challenge:

1) Increasing Rack Density

AI data centers aim to pack GPUs as closely as possible to create a coherent compute fabric for large language model (LLM) training and real-time inference. The NVL72 already features extremely high density, necessitating liquid cooling for heat dissipation. With further scaling, interconnect distances will increase. The question arises: will copper cabling remain viable, or will the industry need to transition to optical interconnects, despite their higher cost and power inefficiencies?

2)The Shift to Multi-Die GPUs

To boost computational capacity, increasing GPU die size has been one approach. However, with the Vera Rubin platform, GPUs have already reached the reticle limit, necessitating a shift to multi-die architectures. This will increase the physical footprint and interconnect distance, posing further engineering challenges.

3) Surging Rack Power Density

As GPU size and node count increase, rack power density is skyrocketing. NVIDIA’s GB200 NVL72 racks already consume 132 kW, and the upcoming Rubin Ultra NVL572 is projected to require 600 kW per rack. Given that AI data centers typically operate within a 50 MW range, fewer than 100 racks can be housed in a single facility. This constraint demands a new approach to scaling AI infrastructure.

4)Disaggregating AI Compute Across Data Centers

As power limitations become a bottleneck, AI clusters may need to be strategically distributed across multiple data centers based on power availability. This introduces the challenge of interconnecting these geographically dispersed clusters into a single virtual AI compute fabric. Coherent optics and photonics-based networking may be necessary to enable low-latency interconnects between data centers separated by miles. NVIDIA’s recently introduced silicon photonics switch may be part of this solution, at least from the standpoint of lowering power consumption, but additional innovations in data center interconnect architectures will likely be required to meet the demands of large-scale distributed AI workloads.

The Future of AI Data Centers

As NVIDIA continues to innovate, the next generation of AI data centers will need to embrace new networking technologies, reimagine power distribution, and pioneer novel solutions for high-density, high-performance computing. The future of AI isn’t just about more GPUs—it’s about building the infrastructure to support them at scale.

 

Related blog: Insights from GTC25: Networking Could Tip the Balance in the AI Race
[wp_tech_share]

With a wave of announcements coming out of GTC, countless articles and blogs have already covered the biggest highlights. Rather than simply rehashing the news, I want to take a different approach—analyzing what stood out to me from a networking perspective. As someone who closely tracks the market, it’s clear that AI workloads are driving a steep disruption in networking infrastructure. While a number of announcements at GTC25 were compute related, NVIDIA made it clear that implementations of next generation GPUs and accelerators wouldn’t be made possible without major innovations on the networking side.

1) The New Age of AI Reasoning Driving 100X More Compute Than a Year Ago

Jensen highlighted how the new era of AI reasoning is driving the evolution of scaling laws, transitioning from pre-training to post-training and test-training. This shift demands an enormous increase in compute power to process data efficiently. At GTC 2025, he emphasized that the required compute capacity is now estimated to be 100 times greater than what was anticipated just a year ago.

2) The Network Defines the AI Data Center

The way AI compute nodes are connected will have profound implications on efficiency, cost, and performance. Scaling up, rather than scaling out, offers the lowest latency, cost, and power consumption when connecting accelerated nodes in the same compute fabric. At GTC 2025, NVIDIA unveiled plans for its upcoming NVLink 6/7 and NVSwitch 6/7, key components of its next-generation Rubin platform, reinforcing the critical role of NVLink switches in its strategy. Additionally, the Spectrum-X switch platform, designed for scaling out, represents another major pillar of NVIDIA’s vision (Chart). NVIDIA is committed to a “one year-rhythm”, with networking keeping pace with GPU requirements. Other key details from NVIDIA’s roadmap announcement also caught our attention, and we are excited to share these with our clients.

Source: NVIDIA GTC25

 

3) Power Is the New Currency

The industry is more power-constrained than ever. NVIDIA’s next-generation Rubin Ultra is designed to accommodate 576 dies in a single rack, consuming 600 kW—a significant jump from the current Blackwell rack, which already requires liquid cooling and consumes between 60 kW and 120 kW. Additionally, as we approach 1 million GPUs per cluster, power constraints are forcing these clusters to become highly distributed. This shift is driving an explosion in the number of optical interconnects, both intra- and inter-data center, which will exacerbate the power challenge. NVIDIA is tackling these power challenges on multiple fronts, as explained below.

4) Liquid-Cooled Switches Will Become a Necessity, Not a Choice

After liquid cooling racks and servers, switches are next. NVIDIA’s latest 51.2 T SpectrumX switches offer both liquid-cooled and air-cooled options. However, all future 102.4 T Spectrum-X switches will be liquid-cooled by default.

5) Co-packaged Optics (CPO) in Networking Chips Before GPUs

Another key reason for liquid cooling racks is to maximize the number of GPUs within a single rack while leveraging copper for short-distance connectivity—”Copper when you can, optics when you must.” When optics are necessary, NVIDIA has found a way to save power with Co-Packaged Optics (CPO). NVIDIA plans to make CPO available on its InfiniBand Quantum switches in 2H25 and on its Spectrum-X switches in 2H26. However, NVIDIA will continue to support pluggable optics across different SKUs, reinforcing our view that data centers will adopt a hybrid approach to balance performance, efficiency, and flexibility.

Source: NVIDIA GTC25

 

6) Impact on Ethernet Switch Vendor Landscape

According to our AI Networks for AI Workloads report, three major vendors dominated the Ethernet portion of the AI Network market in 2024.

However, over the next few years, we anticipate greater vendor diversity at both the chip and system levels. We anticipate that photonic integration in switches will introduce a new dimension, potentially reshaping the dynamics of an already vibrant vendor landscape. We foresee a rapid pace of innovation in the coming years—not just in technology, but at the business model level as well.

Networking could be the key factor that shifts the balance of power in the AI race and customers appetite for innovation and cutting-edge technologies is at an unprecedented level. As one hyperscaler put it during a panel at GTC 2025: “AI infrastructure is not for the faint of heart.”

For more detailed views and insights on thAI Networks for AI Workloads report, please contact us at dgsales@delloro.com.