Intelligent diagnosis, quick response
Jan 13, 2025
[Shenzhen, China, January 13, 2025] Today, Huawei holds the conference on the Top Ten Trends of Data Center Facility 2025. At the conference, Yao Quan, President of Data Center Facility Domain, explains the top ten trends with an aim to inject new impetus into the development of the data center (DC) industry in the AI era, gain insights into the transformation, and lead the leapfrog development of the industry.
He states that DCs have shifted from general-purpose computing power to intelligent computing power thanks to the continuous innovations in AI foundation model technologies. Server performance and power have been improved greatly, and the construction of clusters with 1,000 GPUs, 10,000 GPUs, and 100,000 GPUs has become the norm. The DC industry is embraced with unprecedented development opportunities and also facing challenges in the areas of reliability, high power, high electricity demand, and uncertainty.
Based on in-depth insights and long-term practices, Huawei releases the top ten trends of Data Center Facility 2025 centering on reliability, flexibility, and sustainable development. This is in an effort to share its insights and thoughts on AI DC facilities, build a highly reliable computing base, and power the digital era forward.
Trend 1: Reliability Becomes the Primary Core Requirement of Intelligent Computing DCs
Safety is the top priority to DC construction compared with the cost. In particular, the value of AI devices surges and the fault domain scope continues to expand in the intelligent computing era, making reliability the primary core requirement of intelligent computing DCs. DC reliability is essentially the reliability throughout the lifecycle, covering components, products, architecture, services, and O&M. A DC with low reliability will bring about a higher operation cost. To achieve low costs in a real sense, reliability must be ensured.
Trend 2: The Isolated Architecture Is the Best Choice for Ensuring the Reliability of Intelligent Computing Facilities
The power density of intelligent computing centers keeps going up. Electrical equipment is usually characterized with high voltage and large current, and it is critical to ensure its secure and reliable operation. Remote deployment of electrical equipment of DCs is preferred to ensure stable service running. In addition, electrical equipment, if deployed in the main equipment room, must be isolated from the main services and deployed in a standard manner. The fire resistance duration, water fire extinguishing, emergency ventilation, and one-click power-off requirements must be taken into account to minimize service impacts.
Trend 3: Uninterrupted Cooling Is the Mandatory Capability for High-density Intelligent Computing
In the AI era, air-liquid coexistence is a long-term process. Liquid cooling is an inevitable trend and uninterrupted cooling will become a mandatory capability for high-density intelligent computing. Uninterrupted cooling means zero cooling interruption when the DC is running properly and quick cooling recovery in case of exceptions. By doing so, DCs can run stably.
Trend 4: AI Will Significantly Improve Proactive Security in DC Operation and Maintenance
With AI technologies, faults such as power failures, fires, and high temperatures in DCs can be accurately prevented. This facilitates a shift from passive response to active maintenance and enables us to identify potential risks in advance, significantly improving DC reliability.
Trend 5: Professional Services Are Solid Guarantee for the Reliability of DC Operation
A DC usually has a service life of 10 to 15 years and maintenance is a more determining factor than equipment in the entire lifecycle of the DC. Professional services hold the key to the long-term and reliable running of the DC. No potential risks are left during deployment through professional deployment and full-process management of the DC delivery. On top of that, AI technologies are introduced to implement predictive maintenance instead of fault response, ensuring DC reliability throughout the lifecycle.
Trend 6: Modular Architecture Is the Key to Cope With the Uncertainty of AI DC Requirements
A modular architecture is required for AI DCs to flexibly address the uncertainty of AI DC requirements. A modular architecture features standardized equipment rooms, modular functions, and decoupled electromechanical devices, which can ensure on-demand deployment and elastic scaling of core subsystems and flexible adaptability to future service evolution. Take the Wuhu DC in China as an example. Adopting the modular architecture, the DC is delivered within three months and supports elastic scaling in the future.
Trend 7: Subsystem Prefabrication Is an Effective Method for Fast AI DC Delivery
Prefabrication provides greater production efficiency. DCs that house prefabricated subsystems can better meet the requirements of AI services in terms of elasticity and quick rollout. Subsystem prefabrication is not simple assembly but solution productization. It has to go through professional design, simulation, testing, and automatic tooling to ensure the product delivery quality. Besides, the onsite construction workload is cut by 90% through factory prefabrication and pre-commissioning, greatly shortening the delivery period and providing guarantee for fast and high-quality delivery of AI DCs.
Trend 8: The High Efficiency of Power Supply is More and More Valuable in AI DCs
High-density and high-computing scenarios pose tough challenges for heat dissipation. From air cooling to liquid cooling, power supply efficiency becomes a major factor in energy efficiency. To look at power supply efficiency of DCs, focus should be given to the power supply efficiency of the parallel system instead of that of a single device and to architecture innovation. For example, the UPS can have a high efficiency of 99.1% and implement 0 ms switchover between modes when working in super economy control operation (S-ECO) mode.
Trend 9: AI Enables the Improvement of Comprehensive Energy Efficiency in DCs
In addition to improving power supply and cooling efficiency, AI technologies can contribute more to enabling the linkage between layers 1 and 2. There are millions of optimization parameters in the air-liquid cooling scenario, exponentially complicating the optimization. To achieve the optimal cooling effect, the AI energy efficiency optimization technology can be used to replace traditional manual optimization. The application of AI technologies makes DCs more energy-saving and energy-efficient.
Trend 10: Computing-electricity Collaboration Will Become a New Mode of DC Construction
Computing power holds the key to AI, and electricity is essential for computing power. As the energy consumption of DCs keeps increasing, direct green power supply will be a solution to reducing energy consumption for DCs. In addition, as the load link of power generation-grid-load-storage integration, DCs can be linked with the power grid to improve the grid usage efficiency (GUE) through frequency regulation and peak shaving. DCs can flexibly schedule loads based on AI training and inference requirements to achieve overall optimal efficiency. Looking to the future, synergy between computing and energy will serve as a new mode for constructing DCs, boosting sustainable development of DCs.
In the AI era, Huawei Data Center Facility will focus on quality and technological innovation to build a highly reliable, flexible, and sustainable power supply solution for intelligent computing centers, help customers and partners seize intelligent computing opportunities, and maximize the output of every watt in an effort to power the digital era forward.