Concerns about datacentre energy consumption are nothing new. Noting a wasteful mismatch between the desktop central processing units (CPUs) and datacentre requirements, researchers set out to deliver a chip that was fit for purpose. The result was commercialized and sparked a trend for Arm-based server chips for the cloud. To celebrate 20 years of HiPEAC, we caught up with Babak Falsafi (EPFL) to find out more.

Article originally published by HiPEAC

Feel free to visit the EuroCloud project page

In the early 2010s, data centres were expanding rapidly, and their energy requirements were spiralling. This bloated energy consumption was partly due to the fact that the hardware and operating systems used for servers were in fact originally designed for desktop computers, Babak Falsafi, professor in the School of Computer and Communication Sciences at EPFL, explains. ‘Server chips were being built with x86 (Intel / AMD) technologies. These chips were primarily power-density limited: the cores were huge, and they had functionality that was not needed at the time by a lot of applications in the data centre, such as data serving, web serving, data caching, and analytics.’

Not only were the CPU cores a complete mismatch for the workloads, Babak notes; other aspects pushed up energy demands: ‘The server chips also integrated a lot of on-chip memory to cool down the chip (memory has a much lower power density) which was at best not used by the applications but in the worst case slowed down lookups for information that resided in larger on-chip memory.’

From EU-funded project to Arm-based server chip

An EU-funded project, EuroCloud Server, whose consortium comprised Arm, EPFL, imec, Nokia and the University of Cyprus, was launched in 2010 to develop cloud-native servers with 64-bit out-of-order Arm cores, derived from the Cortex A-15 and 3D-stacked DRAM running cloud services. The project resulted in ‘scale-out processors’ that organized silicon resources into physical servers called ‘pods’ which shared I/O and memory ports and pins with other pods.

 

‘We showed for the first time (even before Arm cores could be used for servers) that designing optimal chips from a performance and power density perspective would give a 10x improvement from silicon and the electricity used for popular open-source software stacks representing datacentre workloads. This 10x improvement came partially from having a lot more cores for a given electricity budget and having a faster instruction supply for deep software stacks in the cloud,’ says Babak.

This work laid the foundation for an Arm-based CPU server chip for the cloud, the Cavium ThunderX, which employed 48 in-order Arm cores offering an order-of-magnitude larger core-to-cache silicon ratio than conventional server CPUs. However, Babak notes that, at the time, the software ecosystem (Linux) for Arm servers was not available; as a result, Cavium pivoted to the high-performance computing (HPC) – as opposed to cloud – market.

This has changed since: ‘A few years later, Amazon Web Services and HiSilicon started not only building Arm servers but also building a community to create the Linux server ecosystem which is now mature,’ says Babak. ‘NVIDIA is benefitting from this effort and their future chips with tightly integrated graphics processing units (GPUs), CPUs, memory and the network using Arm cores. With liquid cooling, which is expensive but needed for GPUs, NVIDIA can also achieve much higher power densities for their Arm CPUs, allowing them to shift much of the power to computing rather than less productive on-chip memory.’

The future of server design

With unprecedented energy demands projected to meet the requirements of artificial intelligence (AI) workloads, datacentre efficiency is once again in the spotlight. However, Babak notes that this should be kept in perspective. ‘Datacentre electricity consumption has been flat for some time worldwide, thanks to consolidation and improvements in efficiency at roughly 200 TWh per year. Goldman Sachs projects that this number will grow at 16% per year starting at 2019 until 2030 and AI will contribute a bit to it,’ he says. ‘But one should not forget that server chips have also seen a small exponential growth in TDP (thermal design power) – a measure for the maximum amount of heat a chip can dissipate – since 2019 in CPUs and a doubling of TDP per generation in AI chips.’

What does this mean for the design of future servers? ‘We are at the limits of power density today and there are no simple silver bullets to improve efficiency. The latest Arm server chips by Ampere and the latest server chips from Intel are now resorting to dramatically more efficient cores (both in terms of area and electricity budget) to improve performance density and performance per watt,’ says Babak. ‘We are witnessing the post-Moore era of computing where the next waves of efficiency will come from a tighter integration of technologies from algorithms to housing infrastructure. The next big change in the data centre will most likely come from integration at the rack scale, not just for AI but also for the cloud.

Further reading

Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Onur Kocberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Ozer, and Babak Falsafi, ‘Scale-out processors’, Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA ’12). IEEE Computer Society, USA, 2012, pp.500–511 doi: 10.5555/2337159.2337217

Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Onur Kocberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Ozer, and Babak Falsafi, ‘RETROSPECTIVE: Scale-Out Processors, in José F. Martínez and Lizy John (eds), ISCA@50 25-Year Retrospective:1996-2020, ACM SIGARCH and IEEE TCCA, June 2023 bit.ly/isca50_retrospective

Babak Falsafi, Michael Ferdman and Boris Grot, ‘Server Architecture From Enterprise to Post-Moore’, in IEEE Micro, vol. 44, no. 5, pp. 65-73, Sept.-Oct. 2024, doi: 10.1109/MM.2024.3418975