[wp_tech_share]

 

A New Year always marks a great time to look back and reflect on the previous year and predict what it means for the coming year. It’s specifically an exciting time for the Data Center Physical Infrastructure research at Dell’Oro Group, with the program’s first publication of the Q3 2021 Report. While we did not make any predictions for data center physical infrastructure in 2021, we can certainly recap the year before looking at 2022 predictions.

For the data center physical infrastructure market, 2021 can be split into two major themes. During the first half of 2021, the market for data center physical infrastructure rebounded strongly, growing 17.7% to $10 billion after a pandemic-induced market dip in 2020. Year-over-year comparisons were favorable, but cloud service provider investment and rebounding enterprise spending in North America and EMEA drove the market past 2019 levels. However, the story changed in the second half of 2021. New covid-19 variants Delta and Omicron reared their ugly heads, while supply chains began to break down, leading to a lack of availability in components and products, raw material price increases, and labor and logistical issues. We forecast that this slowed data center physical infrastructure growth to 4.7%, with the market reaching $11.4 billion in revenues during the second half of 2021. Data center physical infrastructure vendors entered 2022 with record backlogs, but questions remain on how much supply they will be able to deliver as demand continues to outpace supply. While these supply chain issues will likely persist throughout 2022, what else does the data center physical infrastructure market have in store for us?

1. Plans to Reach Long-Term Data Center Sustainability Goals Begin to Materialize

As the global COVID-19 pandemic accelerated digital adoption and growth throughout 2021, it also cast a large shadow on the growing climate impact of data center growth. It’s no wonder sustainability quickly became one of the most common buzzwords in the industry. The data center industry responded with aggressively expanding sustainability commitments, which were previously largely tied to 100% renewable energy offset credits. Renewable energy goals transitioned from 100% renewable energy offsets to 100% renewable energy consumption. Data center water usage also came under fire, with Microsoft notably pledging to cut water usage 95% by 2024 and become water positive by 2030. But by far the most common goal set by data center owners and operators was to become carbon neutral, or even carbon negative in some cases, by 2030. Critics were quick to point out the difficult path to achieve those goals, with details on how, sparse. 2022 will bring more clarity on some of the technologies that will help enable progress towards those goals. Data Center Physical Infrastructure will specifically play a big role in a number of areas.

    • Backup power connects to the grid – A large portion of data center physical infrastructure is committed to providing clean, uninterruptible power to IT infrastructure even during a utility power outage, through the use of UPS systems, batteries, and generators. Those systems largely sit unutilized when utility power is on. That is beginning to change, spurred by the adoption of lithium-ion batteries, which are creating new energy storage use cases at data center facilities. This technology, commonly referred to as Grid-connected UPS, will enable opportunities for those idle assets to become revenue-generating or cost-saving, through peak shaving, frequency regulation, and other grid participation activities, in addition to supporting better integration of renewable energy. Microsoft and Eaton have publically collaborated on grid-interactive UPS, recently releasing a white paper on the subject. We predict major strides on-grid interactive UPS systems in 2022, with details and an ecosystem forming around early pilots to support execution on larger scale rollouts.
    • Fuel cells replace generators – Okay, this isn’t happening in 2022. But, the recent announcement of Vertiv, Equinix, and other utility, fuel cell, and research partners working on a proof-of-concept (POC) fuel cell use case for data centers funded by the Clean Hydrogen Partnership sure does create some excitement. Vertiv has committed to providing a 100kW fuel cell module with an integrated UPS by 2023. Here’s to hoping we can get updates throughout the year to learn more about how fuel cell technology can be applied to data centers and on what timeline.
    • Data center heat re-use bubbles up to the top of sustainability priorities – Data centers consume a lot of power, and in turn, generate a lot of heat. Today, air-based thermal management systems are in place to capture and reject that heat into the atmosphere. However, that heat has a significant opportunity to be re-used, with district heating and urban farming as commonly cited examples. The difficulty in scaling data center heat re-use is that today’s thermal management designs and infrastructure largely don’t support it. In 2022, we predict that to change, with heat re-use technology being designed into new products and data center architectures. To take full advantage of heat reuse, data centers owners and ecosystem vendors will turn to liquids, which transfer energy up to ten times more efficiently than air, to get the most out of heat-reuse technology.

2. Liquid Cooling Adoption Momentum Continues as POC Deployments Proliferate and Early Adopters Begin Larger Roll Outs

Traditionally, the data center industry has been conservative in adopting new physical infrastructure technologies. Interested in bringing liquids into my IT space, let alone into the IT rack? Absolutely not. However, as Moore’s Law has struggled to keep pace, data center rack densities have started to rise. In the high-performance computing (HPC) space, air-cooling simply wasn’t an option anymore as HPC IT rack densities surpassed 20 kW, 50 kW, and even 100 kW in some cases. This trend formed the foundation of the market that is liquid cooling today. This includes both direct liquid (pumping liquids to cold plates directly attached to CPUs, GPUs, and memory) and immersion (submerging an entire rack of servers in a liquid-filled tank).

Liquid cooling market revenue growth accelerated in 2021, growing an estimated 64.3% from 2020 to $113M. Another 25% growth is forecast for 2022, with the market forecast to reach $141M, despite constrained supply chains. This growth is forecast to be driven by proliferating POCs from cloud, colocation, and telco service providers, in addition to large enterprises dipping their toes in. For early adopters, larger-scale rollouts of liquid cooling technology are forecast to begin, with increased awareness and comfort in operating liquid-cooled data centers. With momentum continuing to build, an inflection point for liquid cooling adoption appears near.

3. Supply Chain Resiliency and Integrated Solutions Drive Mergers, Acquisitions, and Partnerships

Supply chain discussions are creeping into nearly every conversation these days, so we can’t have 2022 predictions without assessing what impact they might have on the year. First, we do believe supply chain issues will persist throughout 2022, and potentially into 2023. However, we predict their lasting impact on the year will be from the mergers, acquisitions, and partnerships they drive.

Supply chain disruptions have become common place over the past three years. From the onset of US and China trade war tensions, data center physical infrastructure vendors have already been localizing supply chains in region, for region. The pandemic has only added more unpredictability to global supply chains, exposing further weaknesses. To address these weaknesses, we predict a flurry of mergers and acquisitions. We believe these acquisitions will be focused on supply chain resiliency, establishing and growing manufacturing footprints in select regions, while also supporting the delivery of holistic data center solutions at the rack, row, pod or building level. Checking off multiple of these boxes makes any potential acquisition quite appetizing in 2022.

At the beginning of next year, we’ll circle back and see how we did on our predictions. In the meantime, stay connect with the data center physical infrastructure program for the latest updates.

[wp_tech_share]

The Nvidia GTC Fall 2021 virtual event I attended last week highlighted some exciting developments in the field of AI and machine learning, most notably, in new applications for the metaverse. A metaverse is a digital universe created by the convergence of the real world and a virtual world abstracted from virtual reality, augmented reality, and other 3D visual projections.

Several leading Cloud service providers recently laid out their visions of the metaverse. Facebook, which changed its name to Meta to align its focus on the metaverse, envisions people working, traveling, and socializing in virtual worlds. Microsoft already offers holograms and mixed-reality on its Microsoft Mesh platform and announced plans to bring holograms and virtual avatars to Microsoft Teams next year. Tencent recently shared its metaverse plan to leverage its strengths in multiplayer gaming on its social media platform.

In order to recreate an accurate virtual representation of the real world, massive amounts of AI training data would need to be acquired, captured, and processed. This would stretch the limits of the compute infrastructure. During GTC, Nvidia highlighted various solutions in three areas that could help pave the way for the proliferation of the metaverse in the near future:

  • Compute Architecture: During the Q&A session, I asked Nvidia CEO Jensen Huang how the data center would need to evolve to meet the needs of the metaverse. Jensen emphasized that computer vision and graphics and physics simulation would need to converge in a coherent architecture and be scaled out to millions of people. In a sense, this would be a new type of computer, a fusion of various disciplines with the data center as the new unit of computing. In my view, such an architecture would be composed of a large cluster of accelerated servers with multiple GPUs within a network of tightly coupled, general-purpose servers. The servers would run applications and store massive amounts of data. Memory coherent interfaces, such as CXL,  NVLink, or their future iterations, offered on x86- and ARM-based platforms, would enable memory sharing across racks and pods. These interfaces would also improve connectivity between CPUs and GPUs, reducing system bottlenecks.
  • Network Architecture: As the unit of computing continues to scale, new network architectures will need to be developed. During GTC, Nvidia introduced Quantum-2, a networking solution composed of a 400 Gbps InfiniBand and a Bluefield-3 DPU (data processing unit) Smart NIC. This combination will enable high-throughput, low-latency networking in a dense and tightly coupled cluster scaling up to one million nodes needed for metaverse applications. 400 Gbps is the fastest server access speed available today. It could double to 800 Gbps in several years. The ARM processor in the Bluefield DPU could directly access the network interface, bypassing the CPU and benefiting time-sensitive AI workloads. Furthermore, we can expect that these scaled-out computing clusters would be shared across multiple users. With a Smart NIC, such as the Bluefield DPU, layer isolation could be provided among users, thereby enhancing security.
  • Omniverse: The compute and network infrastructure could only be effectively utilized with a solid software development platform and ecosystem in place. Nvidia’s Omniverse provides the platform to enable developers and enterprises to create and connect virtual worlds for various use cases. During GTC, Jensen described how the Omniverse could be applied to build a digital twin in an automotive factory with the manufacturing process simulated and optimized by AI. This twin would later serve as the blueprint for the physical construct. The range of potential applications ranged from education to healthcare, retail, and beyond.

We are still in the initial developmental stages of the metaverse; the technology build-blocks and ecosystem are still coming together. Furthermore, as we have seen recently with certain social media platforms and the gaming industry, new regulations could emerge to reset the boundaries between the real and virtual worlds. Nevertheless, I believe that the metaverse has the potential to unlock new use cases for both consumers and enterprises and drive investments in data center infrastructure in the Cloud and Enterprise. To access the full Data Center Capex report, please contact us at dgsales@delloro.com.

[wp_tech_share]

Dell’Oro Group projects that the spend on accelerated compute servers targeted to artificial intelligence (AI) workloads will reach double-digit growth over the next five years, outpacing other data center infrastructure. An accelerated compute server equipped with accelerators such as a GPU, FPGA, or custom ASIC can generally handle AI workloads with much greater efficiency than general purpose (without accelerators) servers. Numerically speaking, deployment of these servers still represents only a fraction of Cloud service providers’ overall server footprint. Yet, at ten or more times the cost of a general-purpose server, accelerated compute servers are becoming a substantial portion of the data center capex.

Tier 1 Cloud service providers are increasing their spending on new infrastructure tailored for AI workloads. In Facebook’s 3Q21 earnings calls, the company announced its plans to increase capex by more than 50% in 2022. Investments will be driven by AI and machine learning to improve ranking and recommendations across Facebook’s platform. In the longer term, as the company shifts its business model to the metaverse, capex investments will be driven by video and compute-intensive applications such as AR and VR. At the same time, Tier 1 Cloud service providers, such as Amazon, Google, and Microsoft, also aim to increase spending on AI-focused infrastructure to enable their enterprise customers to deploy applications with enhanced intelligence and automation.

It has been a year since my last blog on AI data center infrastructure. Since that time, new architectures and solutions have emerged that could pave the way for the further proliferation of AI in the data center. Following are three innovations I’ll be watching closely:

New CPU Architectures

Intel is scheduled to launch its next-generation Sapphire Rapids processor next year. With its AMX (Advanced matrix Extension) instruction set, Sapphire Rapids is optimized for AI and ML workloads. CXL, which will be offered with Sapphire Rapids for the first time, will establish a memory-coherent, high-speed link PCIe Gen 5 interface between the host CPU and accelerators. This, in turn, will reduce system bottlenecks by enabling lower latencies and more efficient sharing of resources across devices. AMD will likely follow on the heels of Intel and offer CXL on EPYC Genoa. For ARM, competing coherent interfaces will also be offered, such as CCIX with Ampere’s Altra processor and NVlink on Nvidia’s upcoming Grace processor.

Faster Networks and Server Connectivity

AI applications are bandwidth hungry. For this reason, the fastest networks available would need to be deployed to connect host servers to accelerated servers to facilitate the movement of large volumes of unstructured data and training models (a) between the host CPU and accelerators, and (b) among accelerators in a high-performance computing cluster. Some Tier 1 Cloud service providers are deploying 400 Gbps Ethernet networks and beyond. The network interface card (NIC) must also evolve to ensure that server connectivity is not inhibited as data sets become larger. 100 Gbps NICs have been the standard server access speed for most accelerated compute servers. Most recently, however, 200 Gbps NICs are increasingly used with these high-end workloads, especially by Tier 1 Cloud service providers. Some vendors have added an additional layer of performance by integrating accelerated compute servers with Smart NICs or Data Processing Units (DPUs). For instance, Nvidia’s DGX system could be configured with two Bluefield-2 DPUs to facilitate packet processing of large datasets and provide multi-tenant isolation.

Rack Infrastructure

Accelerated compute servers, generally equipped with four or more GPUs, tend to be power hungry. For example, an Nvidia DGX system with 8 A100 GPUs has a maximum system power usage rated at 6.5kW. Extra consideration would be needed to ensure efficient thermal management. Today, air-based, thermal management infrastructure is predominantly used. However, as rack power densities are on the rise to support accelerated computing hardware, air-cooling efficiencies and limits are being reached. Novel liquid-based, thermal management solutions, including immersion cooling, are under development to further enhance the thermal efficiencies of accelerated compute servers.

These technology trends will continue to evolve and drive the commercialization of specialized hardware for AI applications. Please stay tuned for more updates from the upcoming Data Center Capex reports.

[wp_tech_share]

The data center industry is estimated to have consumed 205 terawatt-hours (TWh) or ~1% of the world’s energy consumption in 2018. Other industry estimates peg that rate higher at up to ~2%. Despite these different estimates, one thing is clear: the decade-old fear of runaway growth in data center energy consumption has proved to be unfounded. Hyperscale cloud service providers (CSPs) have largely managed that concern, with the help of industry vendors, through IT virtualization and higher utilization of power and cooling infrastructure. At the same time, enterprises data center operations, while historically less efficient, have transitioned to CSPs.

However, these estimates were calculated before the global COVID-19 pandemic, which saw the world embrace virtual collaboration, remote learning, and accelerated automation through artificial intelligence (AI) and machine learning (ML). While these trends materialized throughout 2020, rendering the industry (barely) able to meet demand, questions resurfaced about managing future energy consumption. For this reason, data center sustainability has become the most pressing issue in the data center industry, one in which data center physical infrastructure vendors believe they can play a critical role.

As part of Dell’Oro Group’s upcoming Data Center Physical Infrastructure program, we will focus on technologies that enable sustainable data center growth. That’s why data center thermal management, which consumes 30% to 40% of a data center’s annual energy consumption, second only to compute, is the logical starting place. Today, air-based, thermal management infrastructure is predominantly used. However, as rack power densities are on the rise to support accelerated computing hardware (such as GPUs and FPGAs), air-cooling efficiencies and limits are being reached. Liquids are a much more effective and efficient medium for transferring heat. For this reason, the data center industry is exploring different ways to safely bring liquids into the data center.

That’s why, when I had an opportunity to see CGG’s High Performance Compute Center, I experienced a level of nervousness and excitement that I haven’t felt in some time prior to touring a data center. This was the first time I have been inside a liquid immersion-cooled facility, supported by Green Revolution Cooling’s (GRC) infrastructure. GRC is a known leader in immersion-cooling technology, in addition to Asperitas, Submer, and other vendors. Visiting my first immersion-cooled facility felt more like a trip to Mars than the type of data center I’ve spent my entire career getting to know.

Although the data center industry treats liquid cooling as though its use for computing is new, it has actually been around for decades. It dates back to the 1990s, when it was used to cool IBM mainframes. Immersion cooling seeks to solve a similar problem today – removing heat directly at the source – but through a different method. A coolant distribution unit (CDU) is used to pump a liquid – usually some kind of mineral oil – to a rack manifold, where it fills and circulates the liquid through the rack (sometimes referred to as a vat or tank). Servers, which require some modification, are then vertically immersed in the liquid to capture and remove 100% of the generated heat. Right now, the big question being asked by the data center industry is how different does immersion cooling makes my data center?

CGG Doubles Compute Capacity with Immersion Cooling

Walking into the CGG High Performance Compute Center, any notion that I was headed to Mars was quickly dispelled. It looked like a conventional data center with a raised floor and traditional infrastructure, from the UPS down to the rack power distribution units (rPDU). The big difference was the horizontal immersion racks as opposed to vertical ones. As I observed the room, I quickly noticed was how quiet it was. CDU pumps produced the only noise. Things were quiet enough to have a conversation with the person standing next to me. The horizontal immersion racks created an open feeling, allowing me to see around the entire room.

However, a friendlier operating environment isn’t what drove CGG to adopt immersion cooling. The company had reached its limits of space, power, and cooling. In order to expand computing capacity, CGG needed more space and power or a new thermal-management solution. And the new thermal management solution – immersion cooling – did not disappoint. In the same floor space and power footprint, CGG was able to double its computing capacity. Additionally, a significant portion of the existing infrastructure was utilized, while deploying immersion racks in scalable, 100 kW cooling-capacity increments. As a result, CGG had no downtime and only limited capital expenditures (CAPEX) during the transition to immersion cooling.

These benefits aren’t unique to CGG’s deployment of immersion cooling. In fact, they can be achieved by many players in the data center industry struggling with space, power, or cooling constraints. To quantify the benefits, CAPEX for construction of a new immersion-cooled data center relative to a traditional air-cooled build can be reduced by 20%. This is the result of eliminating certain infrastructure, such as chillers or air handlers, in addition to smaller-sized electrical infrastructure, such as UPSs, switch gears, and power distribution.

The case for immersion cooling becomes even more compelling when considering operational expenditures (OPEX). Immersion-cooling systems use less power as a result of removing server fans, air handling units, and chilled water systems. Lower-power consumption for thermal management means reduced annual energy costs. Additionally, with fewer moving parts in an immersion-cooling solution, maintenance costs are also reduced. In total, immersion cooling OPEX costs can decrease by up to 33% compared to traditional air-cooled data center builds. From a total cost of ownership (TCO) perspective over the 10-year life of a data center, it’s achievable for an immersion-cooled data center to cost half as much as a traditional air-cooled build.

Immersion Cooling Brings Small Changes to Data Center Operations

So, what’s the catch? The human element of operations in the mission-critical, data center industry can’t be overlooked. Data center uptime is measured by the number of nines (e.g., 99.9% v. 99.9999% uptime), as downtime can translate into hundreds of thousands of dollars – or even millions – in lost revenue. Historically, this had led to slow adoption of new technologies. Early adopters are often driven by need, as is the case with liquid cooling for HPC. But, with increased adoption of accelerated compute, many other companies are already struggling or are expected to struggle with the limits of air-cooling in the near future.

In my visit to CGG’s High Performance Compute Center, I was most eager to learn about the “quirks” of immersion cooling. The biggest difference from air-cooled builds is in server maintenance. Servers have to be pulled out of the oil by hand or using a small, overhead lift. They can then be laid across the tank while work is performed, either immediately or after a short period of drip drying. After maintenance is complete, the server is simply immersed back into the rack.

Other operational differences that data center owners and operators must consider are:

  • Containment of the oil in which servers are immersed is top of mind. For CGG, this didn’t appear to be a problem. Different combinations of rack and row and room containment are used to manage any dripping when removing servers. It’s definitely handy to keep a roll of oil-absorbent towels around but no major spills have occurred.
  • Stickers imprinted with a server’s serial number can come loose during immersion. This seemed to be the biggest potential headache. If a sticker comes loose, it doesn’t cause any damage to the immersion cooling system due to the filtration system. However, it’s possible for a missing sticker to impact asset management. Some immersion-ready servers already utilize a pull-tag system. This eliminates the issue. Development of oil-resistant stickers is also being explored.
  • Cable management isn’t more complex for immersion cooling, just different. CGG utilizes multiple generations of GRC immersion racks, which reflect the evolution of rPDU and network switch placement. They have moved between dry space in the rack and mounted on the back of the tank. GRC’s latest immersion-cooling product, the ICEraQ 10, utilizes dry space in the top-rear of the rack for rPDUs with networking switches mounted on the front behind a panel.
  • Lastly, beware of crickets. It turns out that crickets have a taste for the particular immersion oil GRC uses, so an open bay door may lead to an extra visitor. Just like a loose serial number sticker, there is no threat of damage – just an unexpected find when opening the rack lid.
Immersion Cooling Answers the Call for Sustainable Data Centers of the Future

The engineered benefits of immersion cooling can’t be denied – higher utilization of space and power, while achieving lower CAPEX and OPEX relative to a traditional air-cooled facility. However, I didn’t need to visit an immersion-cooled facility to understand the cost savings. My biggest takeaway was correction of my misconception that an immersion-cooled data center would be dramatically different from an air-cooled facility. It was familiar, like other data centers I have toured. The only difference in physical infrastructure was the rack itself. IT infrastructure is mounted vertically, as opposed to horizontally. Immersion-ready servers are available today with expanding partnerships between chip, server, and immersion vendors working on the next generation of compute. While planning for a few operational differences that need to occur, to my surprise, necessary adjustments are relatively minor. So can immersion cooling be a part of the solution that supports sustainable data centers of the future? After my visit to CGG’s High Performance Compute Center, I believe it just might be.

This November, Dell’Oro Group will launch a new Data Center Physical Infrastructure subscription program. As the program’s lead analyst, I will dig deeper into the market outlook, growth drivers, and the competitive landscape of the data center physical infrastructure market. I will quantify industry trends and developments, providing a timely, accurate, and detailed analysis. To learn more about Dell’Oro Group’s new Data Center Physical Infrastructure program, please contact us at dgsales@delloro.com.

[wp_tech_share]

 

Dell’Oro published an update to the Ethernet Controller & Adapter 5-Year Forecast report, July 2021. Revenue for the worldwide Ethernet controller and adapter market is projected to increase at a 4% compound annual growth rate (CAGR) from 2020 to 2025, reaching nearly $3.2 billion. The increase is partly driven by the migration to server access speed of 100 Gbps and higher.

The ramp of 25 Gbps port shipments has been strong since the availability of 28 Gbps SerDes in 2016. 25 Gbps has already displaced 10 Gbps to become the dominant speed in revenue, as 25 Gbps gains broad adoption across Cloud service providers (SPs) and high-end enterprises. However, we project that 100 and 200 Gbps speed ports to overtake that of 25 Gbps in revenue as early as 2023.

We identify the market and technology drivers below that are likely to drive the adoption of next-generation server connectivity based on 100 Gbps and beyond:

  • 50 Gbps ports, based on two 28 Gbps SerDes lanes, have been deployed in mainstream among some of the major Cloud SPs. However, with the exponential growth of network traffic and proliferation of cloud computing, the Top 4 US Cloud SPs are demanding even higher server access speeds than the rest of the market. The availability of 56 Gbps SerDes since late 2018 has prompted some of the Top 4 US Cloud SPs to upgrade their networks to 400 Gbps, with upgrades in server network connectivity to 100 Gbps for general-purpose computing in progress.
  • Higher server access speeds of up to 200 Gbps, based on two lanes of 112 Gbps SerDes, could begin to ramp for general-purpose computing for the Top 4 US Cloud SPs following network upgrades 800 Gbps as early as 2022.
  • The increase in demand for bandwidth-hungry AI applications will continue to push the boundaries of server connectivity. Today, 100 Gbps is commonly used to interconnect accelerated servers, while general-purpose servers are connected at 25 or 50 Gbps. As 100 Gbps become the standard connection for general-purpose in several years for the major Cloud SPs, accelerated servers may be connected at twice the data rate at 200 Gbps.

To learn more about the Ethernet Controller and Adapter market, or if you need to access the full report, please contact us at dgsales@delloro.com.

About the Report

The Dell’Oro Group Ethernet Controller and Adapter 5-Year Forecast Report provides a complete, in-depth analysis of the market with tables covering manufacturers’ revenue; average selling prices; and unit and port shipments by speed (1 Gbps, 10 Gbps, 25 Gbps, 40 Gbps, 50 Gbps, and 100 Gbps) for Ethernet and Fibre Channel Over Ethernet (FCoE) controllers and adapters. The report also covers Smart NIC and InfiniBand controllers and adapters. To purchase this report, please contact us at dgsales@delloro.com.