Prof. Babak Falsafi makes a review of the state-of-the-art in datacenter energy efficiency, and presents his vision of the future.
Original article published here in German:
Wie effizient sind Rechenzentren wirklich?
and in French:
Efficacité réelle des centres de données
The datacenter market is constantly evolving - and so is energy consumption. Until now, energy efficiency has often been expressed in terms of the PUE (Power Usage Effectiveness) indicator, but this is increasingly out of touch with reality. How can datacenter efficiency and emissions be accurately determined?
Electricity markets have been extremely volatile over the past two years, due to concerns about energy supply. With increasing digitalization and the introduction of artificial intelligence (AI) in all sectors, concern is growing not only about electricity consumption in datacenters, but also about the growing demand and ecological impact of IT. A study by Schneider Electric forecasts a 5% annual increase in electricity consumption for the entire IT sector between 2023 and 2030, 75% of which is expected to be attributed to datacenters (driven by AI) and mobile networks (due to the move to 5G).
Electricity consumption by datacenters worldwide has remained relatively stable over the past ten years, constituting around 1.5% of global electricity consumption. In service-oriented economies, this percentage is somewhat higher: in Switzerland, for example, it is 4%. This stability can be explained by the fact that companies are increasingly turning to the cloud, and cloud providers are keeping a close eye on electricity consumption to maximize the return on their investment. Colocation datacenters, which house customers' IT equipment, are continually developing more efficient infrastructures for cooling, power distribution and heat recovery.
A 2022 study by the Uptime Institute (see graph below) shows that the energy efficiency of datacenters worldwide - measured by the conventional energy efficiency indicator PUE, obtained by dividing the total energy consumed by the datacenter by the total energy used by the IT equipment - has been declining only slowly in recent years.
What are the limits of PUE?
Like many other indicators, PUE has its limitations. First of all, it does not take into account the total environmental impact of datacenter operations. It only considers the share of electricity consumed in the building's infrastructure, including cooling and power distribution, and does not take into account the various ways in which the overall flow of energy within the datacenter can contribute to sustainability, or reduce emissions. Modern datacenters rely on both waste heat recycling technologies and on-site renewable energies.
Secondly, the PUE value is subject to fluctuations depending on factors such as the season, current datacenter load and even the time of day. This makes it an unreliable measure of efficiency, especially for datacenters operating under variable environmental conditions and loads.
One of the biggest limitations of the PUE value is that it is not very relevant for measuring IT efficiency. Ironically, inefficient servers can make the PUE value look surprisingly low. This is because the more energy IT devices consume, the better the PUE. This encourages the provision of surplus IT resources to artificially improve PUE values. Even if IT equipment is efficient, which is probably the case in newly built datacenters, the degree of utilization of IT systems also has a considerable influence on operational efficiency - but the PUE value doesn't tell you whether servers are being used at 20% or 80%.
In Switzerland, thanks to exemplary work in the field of sustainable datacenters, we are aiming for a PUE value of 1.15. This means that over 80% of the electricity in these datacenters is used for IT equipment (servers, memory, network). The question of energy consumption and efficiency in datacenters therefore revolves around IT.
Where is IT heading?
Technological forecasts point to spectacular growth in electricity consumption in IT. On the one hand, silicon manufacturing technologies have benefited for four decades from a doubling of chip density every two years (Moore's Law). This increase in chip density has been accompanied by a corresponding improvement in energy efficiency, so that denser chips have been able to operate at higher frequencies, without increasing overall energy consumption.
But progress in silicon density has meanwhile reached physical limits. While there are improvements in algorithms, software and chip design that enable platform specialization, none of these will lead to an exponential increase in density for all datacenter services.
On the other hand, demand has increased by an average of six times every year over the past ten years due to the rapid growth of artificial intelligence (AI). This is happening at the same time as Moore's Law is slowing down, meaning that new computing devices now have to be developed and deployed faster than ever before.
The above graph illustrates the rapid increase in Thermal Design Power (TDP) - the maximum heat dissipation in watts - of server processors (CPUs), the basic computing units in datacenters. CPU TDP rose from single-digit values in the 1990s to around 100 W in 2000, then stabilized for a decade thanks to energy-efficient designs. However, as gains in efficiency and density get smaller and smaller, TDPs increase rapidly for the latest CPUs. Graphics processors, the GPUs that form the basis of AI, have a dramatically higher TDP value: 300 W for an Nvidia A100 card, and the latest product, the H100 card, can even reach 700 W.
These trends call for appropriate methods and indicators to assess the energy efficiency of IT devices and loads.
What are the right metrics for IT?
Given the predicted growth in IT energy consumption and the advances being made in datacenter infrastructure, new indicators and methods for assessing datacenter energy efficiency and emissions are needed. This applies to both infrastructure and IT equipment. These indicators must not only take into account heat recycling and the use of renewable energies in the building's infrastructure, but also the efficiency of the various components of IT equipment, including computing logic (e.g. CPUs, GPUs and gas pedals), memory, data storage and network equipment. In addition, precise measurement methods and appropriate software and hardware instruments are needed to determine these indicators.
It is also essential to look at other indicators which, until now, have tended to be in the background, but are becoming increasingly important. Workload utilization, for example, offers a more nuanced view of how efficiently IT resources are being used. A server that is only 20% utilized most of the time is not just an unused resource, but also flagrant inefficiency, leading directly to wasted energy and higher operating costs.
Technological quality is also essential. Although not a traditional indicator, it serves as a benchmark for evaluating the devices and methods used in a datacenter. When choosing technologies, operators must ensure maximum performance and focus on the most modern and efficient options. For example, opting for flash memory instead of hard disks can significantly reduce power consumption and cooling requirements, while speeding up data access. Similarly, choosing fiber optic cables instead of copper cables for networks not only increases speed, but also minimizes energy consumption. This choice also extends to servers, power supplies and power distributors, which stand out for their energy efficiency, reliability and longevity, reducing both energy consumption and total cost of ownership.
Finally, the maximum permissible operating temperature is another important indicator. Traditionally, datacenters operated at lower temperatures to minimize the risk of overheating. Modern devices, however, have been designed to operate safely at higher temperatures. By adapting maximum permitted operating temperatures to these higher limits, companies can drastically reduce energy consumption for cooling, which has a considerable impact on overall datacenter efficiency.
The SDEA label is comprehensive
In order to measure datacenter efficiency and emissions holistically, the Swiss Datacenter Efficiency Association (SDEA) - a consortium of sustainability pioneers from industry and science - created the SDEA label in 2020. SDEA's KPI (key performance indicator) tool features a calculator that records heat recycling, use of renewable energies, consolidation and virtualization of workloads, use of servers, storage and network components, data compression, first-class component technology and permissible operating temperature. According to a recent study by the International Energy Agency (IEA), the SDEA label is the only certification for datacenters that offers a quantitative ranking and does not simply make recommendations.