The OCP Global Summit was back to an exclusively in-person event in 2022. The community was as excited as ever to get together in person, with 3500+ people in attendance for an all-time record of attendance. In this new blog, exclusively for OCP, Lucas Beran, Principal Analyst for Data Center Physical Infrastructure market, will discuss the three key takeaways from the event.
At the end of April, Nokia, a fairly new entrant to the data center switch space, made the groundbreaking announcement that the company will be supplying its 7250 IXR networking gear to Microsoft, the third-largest Cloud Service Provider (SP).
As I noted in my 2022 prediction blog published earlier this year, I have been anticipating a fair number of new switch vendor insertions at the large hyperscalers in 2022, as the 400 Gbps upgrade cycle starts to materialize outside of Google and Amazon. Silicon diversity would be one of the major reasons for these potential changes in the vendor landscape, as these hyperscalers need to keep pricing pressure on Broadcom, the dominant merchant silicon supplier to date. Supply challenges further accelerated the need for silicon diversity. However, what is intriguing is that Nokia’s 7250 IXR is based on Broadcom’s merchant silicon, not Nokia’s FP5 proprietary chips. So what will Nokia bring to the table?
What’s in it for Microsoft?
Although Nokia is a fairly new entrant in the data center switch space, the company is among the leading vendors in the router market and in several other Telecom SP segments. Clearly, Nokia has significant experience in systems design, which – as we learned from the company’s spokesperson – allowed it to achieve power savings at a system level. As a reminder, as network speeds move to 400 Gbps and beyond, power consumption becomes one of the most constraining factors that limits what Cloud SPs can build and deploy in their data centers. In fact, Microsoft already faced this challenge with its 400 Gbps deployment, as it had to wait for Broadcom’s Jericho 2C+ chips that consume less power than their prior generation of Jericho 2 counterparts.
Furthermore, Nokia has made significant contributions to the SONIC ecosystem. (SONIC is the open-source software built by Microsoft that runs in its data center networks.) We view this Microsoft data center win as a reward for the company’s contribution. In fact, this quid pro quo relationship expands well beyond the data center win into several other areas. For example, Nokia is also working with Microsoft on developing 4G LTE and 5G private wireless for the enterprise segment. This collaboration brings together Nokia’s virtualized radio access network (vRAN) and multi-access edge cloud (MEC) with the Azure Private Edge platform.
Additionally, Nokia has the potential to leverage its coherent optics technology; which the firm obtained with its Elenion acquisition to drive cost and power savings at a system level for data center interconnect (DCI) applications.
Last, but not least, although Nokia’s 7250 IXR is built on Broadcom’s silicon which does not satisfy the silicon diversity requirement, it will nonetheless provide Microsoft with another route to access Broadcom chips, which is critical in a supply-constrained environment.
Where will Nokia’s 7250 IXR be deployed?
The initial deployment of Nokia’s modular switches will occur in the spine, which Microsoft refers to as Tier 2, but may expand to DCI applications at a later stage. As a reminder, Microsoft has been deploying predominantly Arista in Spine/DCI but has also recently qualified Cisco (with its silicon one-based 8000 chassis). Nokia will also supply fixed form factors for Top-of-Rack (ToR) applications. It is worth noting that Microsoft has always had a multi-vendor strategy for its ToR applications, where volume is high but the margin is thin. So far, the company has deployed a mix of Cisco, Dell, and Mellanox (Nvidia).
What does this mean for incumbent vendors?
While we view this announcement as a major win for Nokia and as validation of its competitive positioning in the data center switch market, we believe that Microsoft will strive to keep its existing suppliers happy and provide them with enough motivation to compete for its business. Our interviews revealed that Arista is expected to remain the preferred supplier for spine/DCI applications at Microsoft during the 400 Gbps upgrade cycle. Additionally, we expect Microsoft to go through major expansion and upgrade activities this year and that its data center spending will be strong enough to benefit all vendors – incumbents as well as new entrants.
For more details and insights on cloud service providers’ data center network design and a list of suppliers, please contact us at email@example.com.
Data centers are the backbone of our digital lives, enabling the real-time processing of and aggregation of data and transactions, as well as the seamless delivery of applications to both enterprises and their end customers. Data centers have been able to grow to support ever-increasing volumes of data and transaction processing thanks in large part to software-based automation and virtualization, allowing enterprises and hyperscalers alike to adapt quickly to changing workload volumes as well as physical infrastructure limitations.
Despite their phenomenal growth and innovation, the principles of which are being integrated into service provider networks, data centers of all sizes are about to undergo a significant expansion as they are tasked with processing blockchain, bitcoin, IoT, gigabit broadband, and 5G workloads. In our latest forecast, published earlier this month, we expect worldwide data center capex to reach $350 B by 2026, representing a five-year projected growth rate of 10%. We also forecast hyperscale cloud providers to double their data center spending over the next five years.
Additionally, enterprises are all becoming smarter about how to balance and incorporate their private clouds, public clouds, and on-premises clouds for the most optimal and efficient processing of workloads and application requests. Similar to highly-resilient service provider networks, enterprises are realizing that the distribution of workload processing allows them to scale faster with more redundancy. Despite the general trend towards migrating to the cloud, enterprises will continue to invest in on-premises infrastructure to handle workloads that involve sensitive data, as well as those applications that are very latency-sensitive.
As application requests, change orders, equipment configuration changes, and other general troubleshooting and maintenance requests continue to increase, anticipating and managing the necessary changes in multi-cloud environments becomes exceedingly difficult. Throw in the need to quickly identify and troubleshoot network faults at the physical layer and you have a recipe for a maintenance nightmare and, more importantly, substantial revenue loss due to the cascading impact of fragmented networks that are only peripherally integrated.
Although automation and machine learning tools have been available for some time, they are often designed to automate application delivery within one of the multiple cloud environments, not across multiple clouds and multiple network layers. Automating IT processes across both physical and virtual environments and across the underlying network infrastructure, compute and storage resources have been a challenge for some time. Each layer has its own distinct set of issues and requirements.
New network rollouts or service changes resulting in network configuration changes are typically very labor-intensive and frequently yield faults in the early stages of deployment that require significant man-hours of labor.
Similarly, configuration changes sometimes result in redundant or mismatched operations due to the manual entry of these changes. Without a holistic approach to automation, there is no way to verify or prevent the introduction of conflicting network configurations.
Finally—and this is just as true of service provider networks as it is of large enterprises and hyperscale cloud providers—detecting network faults is often a time-consuming process, principally because network faults are often handled passively until they are located and resolved manually. Traditional alarm reporting followed by manual troubleshooting must give way to proactive and automatic network monitoring that quickly detects network faults and uses machine learning to rectify them without any manual intervention whatsoever.
Automating a Data Center’s Full Life Cycle
As the size and complexity of data centers continue to increase and as workload and application changes increase, the impact on the underlying network infrastructure can be difficult to predict. Various organizations both within and outside the enterprise have different requirements that all must somehow be funneled into a common platform to prevent conflicting changes to the application delivery layer all the way to the network infrastructure. These organizations can also have drastically different timeframes for the expected completion of changes largely due to siloed management of different portions of the data center, as well as different diagnostic and troubleshooting tools in use by the network operations team and the IT infrastructure teams.
In addition to pushing on their equipment vendor and systems integrator partners to deliver platforms that solve these challenges, large enterprises also want platforms that give them the ability to automate the entire lifecycle of their networks. These platforms use AI and machine learning to build a thorough and evolving view of underlying network infrastructure to allow enterprises to:
- Support automatic network planning and capacity upgrades by modeling how the addition of workloads will impact current and future server requirements as well as the need to add switching and routing capacity to support application delivery.
- Implement network changes automatically, reducing the need for manual intervention and thereby reducing the possibility of errors.
- Constantly provide detailed network monitoring at all layers and provide proactive fault location, detection, and resolution while limiting manual intervention.
- Simplify the service and application provisioning process by providing a common interface that then translates requests into desired network changes.
Ultimately, one of the key goals of these platforms is to create a closed-loop between network management, control, and analysis capabilities so that changes in the upper-layer services and applications can drive defined changes in the underlying network infrastructure automatically. In order for this to become a reality in increasingly complex data center network environments, these platforms must provide some critical functions, including:
- Providing a unified data model and data lakes across multiple cloud environments and multi-vendor ecosystems
- This function has been a long-standing goal of large enterprises and telecommunications service providers for years. Ending the swivel-chair approach to network management and delivering error-free network changes with minimal manual intervention are key functions of any data center automation platform.
- Service orchestration across multiple, complex service flows
- This function has also been highly sought-after by large enterprises and service providers alike. For service providers, SDN overlays were intended to add in these functions and capabilities into their networks. Deployments have yielded mixed, but generally favorable results. Nevertheless, the principles of SDN continue to proliferate into other areas of the network, largely due to the desire to streamline and automate the service provisioning process. The same can be said for large enterprises and data center providers.
- Providing a unified data model and data lakes across multiple cloud environments and multi-vendor ecosystems
Although these platforms are intended to serve as a common interface across multiple business units and network layers, their design, and deployment can be modular and gradual. If a large enterprise wants to migrate to a more automated model, it can do so at a pace that is suited to the organization’s needs. The introduction of automation can be done first at the network infrastructure layer and then introduced to the application layer. Over time, with AI and machine learning tools aggregating performance data across both network layers, correlations between application delivery changes and their impact on network infrastructure can be determined more quickly. Ultimately, service and network lifecycle management can be simplified and expanded to cover hybrid cloud or multi-vendor environments.
We believe that these holistic platforms that bridge the worlds of telecommunications service providers and large enterprise data centers will play a key role in helping automate data center application delivery by providing a common window into the application delivery network as well as the underlying network infrastructure. The result will be the more efficient use of network resources, a reduction in the time required to make manual configuration changes to the network, a reduction in the programming load for IT departments, and strict compliance with SLA guarantee to key end customers and application provider partners.
As pandemic-related headwinds started to ease, we were optimistic for a return to higher growth on data center infrastructure spending in 2021. The Cloud was entering an expansion cycle and demand signals in the Enterprise were gaining momentum. While data center capex grew 9% in 2021, which was in line with our prior projections, growth was mainly driven by higher cost of data center equipment, rather than by unit volume. Server unit growth, which was flat for the year, was constrained by component shortages and long lead times. Deliveries for networking and physical infrastructure equipment are also facing a mounting backlog. Furthermore, higher supply chain costs, from increased commodity, expedite, and logistics costs led to higher system prices. Our 2022 outlook is more optimistic, with a data center capex projection of 17%, accompanied by double-digit growth in server unit shipments. We identify the following key trends that could shape the dynamics of data center capex in 2022.
Hyperscale Cloud on Expansion Cycle
The Top 4 Cloud service providers—Amazon, Google, Meta (formerly Facebook), and Microsoft—are expected to increase data center capex by over 30% in 2022. Investments will go towards the replacement of aged servers, increased deployment of accelerated computing, as well as servers for new data centers in more than 30 regions that are scheduled to launch in 2022. Furthermore, infrastructure planned last year that was not deployed due to extended equipment lead-times have resulted in additional tailwind growth as deliveries are fulfilled in 2022.
Supply Chain Stabilizing
Generally, the major Cloud service providers have weathered through this tough supply chain climate better than the rest of the market given their strong visibility in their demand and can proactively increase inventory levels of crucial components and build redundancies in their supply chains. On the other hand, data center capex growth in Tier 2 and 3 Cloud service providers and Enterprise have been supply-constrained. There is some consensus that the level of supply chain disruptions is starting to stabilize and possibly ease by the second half of 2022. Lead-time for servers could improve sooner than other data center equipment such as networking, given their relatively larger scale and lower product mix.
Metaverse Could Drive Opportunities In AI Infrastructure
Some of the major Cloud service providers, such as Apple, Meta, Microsoft, and Tencent, have announced plans to enrich their metaverse offerings for both enterprise and consumer applications. This would require increased investments in new infrastructures, such as servers with accelerated co-processors, low-latency networking, and enhanced thermal management solutions. Chip manufacturers and major Cloud service providers will be developing specialized processors for AI applications. The ecosystem would need to evolve to enable the community of AI application developers to broaden the reach of AI into enterprises. AI infrastructure is costly and will be a major capex driver. For instance, we estimate that the cost of AI infrastructure is largely responsible for Meta’s plans to increase capex by approximately 60% this year.
New Server Architectures On The Horizon
Intel is releasing a new processor platform, Sapphire Rapids, later this year. Sapphire Rapids will feature the latest in server interconnect technologies, such as PCIe 5, DDR5, and more importantly, CXL. These new high-speed interfaces could alleviate system bandwidth constraints, enabling more processor cores and memory to be packaged into a single server. CXL would enable memory sharing between the CPU and other co-processors within the server and rack, enabling data-intensive applications such as AI to access memory more efficiently and at lower latencies. AMD and ARM will also incorporate these new interfaces within their processor platforms as well. We expect these enhancements could kick off a multi-year journey of new server architecture developments.
Let’s Not Forget About Server Connectivity
Last but not least on this list, server connectivity will also need to evolve continuously and not clog the connection between server and the rest of the network. The hyperscale Cloud service providers have been deploying in production the latest generation network interface cards (NICs) based on 56 Gbps PAM-4 SerDes of up to 100 Gbps for general purpose workloads, and up to 200 Gbps for advanced workloads such as AI. The Enterprise is fully embracing 25 Gbps NICs, and we anticipate the number of 25 Gbps ports to overtake that of 10 Gbps later this year. Smart NICs, or data processing units (DPUs) are being deployed by the major Cloud service providers across their infrastructure to improve server utilization, and to accelerate latency-sensitive applications such as AI. Outside of the hyperscale, Smart NIC adoption is still in its nascent stage. However, given that most of the network adapter vendors have a Smart NIC solution available in the market, enterprises potentially have a wide range of choices to fit their applications and budget.