Using the matrix to help Meta gear up

Just 12-months after it was created, in December 2004, 1-million people were active on Facebook. As of December 2021 it had an average 1.93 billion daily active users. EPFL is in a unique collaboration with its parent company Meta around distributed deep learning research.

For a user base of this size, large-scale automated-systems must be utilized to understand user experience in order to ensure accuracy and success. EPFL’s Machine Learning and Optimization Laboratory (MLO), led by Professor Martin Jaggi, is in active collaboration with Meta Platforms, Inc., Facebook’s parent company, to solve this unique challenge.

With funding from EPFL’s EcoCloud Research Center, MLO collaborates with Meta through internships at the company for MLO researchers and the use by Meta of a pioneering MLO invention: PowerSGD. MLO is helping Meta to analyze and better understand millions of users’ experiences while at the same time respecting user privacy. This requires collaborative learning, that is, privacy-preserving analysis of information from many devices for the training of a neural network that gathers, and even predicts, patterns of behavior.

To do this, a key strategy is to divide the study of these patterns over “the edge”, using both the user’s device, and others that sit between it and the data center, as a form of distributed training. This requires a fast flow of information and efficient analysis of the data. PowerSGD is an algorithm which compresses model updates in matrix form, allowing a drastic reduction in the communication required for distributed training. When applied to standard deep learning benchmarks, such as image recognition or transformer models for text, the algorithm saves up to 99% of the communication while retaining good model accuracy.

PowerSGD was used to speed up training of the XLM-R model by up to 2x. XLM-R is a critical Natural Language Processing model powering most of the text understanding services at Meta. Facebook, Instagram, WhatsApp and Workplace all rely on XLM-R for their text understanding needs. Use cases include:
1) Content Integrity: detecting hate speech, violence, bullying and harassment;
2) Topic Classification: the classification of topics enabling feed ranking of products like Facebook;
3) Business Integrity: detecting any policy violation for Ads across all products;
4) Shops: providing better product understanding and recommendations for shops.

“There are three aspects to the process. The first is to develop gradient compression algorithms to speed up the training, reducing the time required to prepare this information for its transfer to a centralized hub. The second is efficient training of the neural network within a data center – it would normally take several weeks to process all the information, but we distribute the training, reducing computation from months to days,” said MLO doctoral researcher Tao Lin.

Tao Lin of MLO

As a third aspect, privacy is a constant factor under consideration. “We have to distinguish between knowledge and data. We need to ensure users’ privacy by making sure that our learning algorithms can extract knowledge without extracting their data and we can do this through federated learning,” continued Lin.

The PowerSGD algorithm has been gaining in reputation over the last few years. The developers of deep learning software PyTorch have included it as part of their software suite (PyTorch 1.10), which is used by Meta, OpenAI, Tesla and similar technology corporations that rely on artificial intelligence. The collaboration with Meta is due to run until 2023.

Related news: EPFL’s PowerSGD Algorithm Features in Software Industry

Authors: Tanya Petersen, John Maxwell

Source: EPFL

This content is distributed under a Creative Commons CC BY-SA 4.0 license. You may freely reproduce the text, videos and images it contains, provided that you indicate the author’s name and place no restrictions on the subsequent use of the content. If you would like to reproduce an illustration that does not contain the CC BY-SA notice, you must obtain approval from the author.

Read More

We stand with Ukraine

EcoCloud strongly condemns Russia’s military invasion and acts of war in Ukraine, as well as the dreadful violation of international humanitarian and human rights law. We are really shocked by the tragedy currently unfolding in Ukraine, and we fully support everyone affected by the war.

The EcoCloud community calls on the different governments to take immediate action to protect everyone in that country, particularly including its civilian population and people affiliated with its universities. Now more than ever, we must promote our societal values (justice, freedom, respect, community, and responsibility) and confront this situation collectively and peacefully to end this nonsensical war.

Read More

ASPLOS is back – in person

After going virtual since 2022, ASPLOS is returning to Lausanne for the 2022 edition, 28th February to 4th March.

The 2022 edition of ASPLOS marks its 40th anniversary. In 1982, ASPLOS emerged as the ultimate conference for researchers from a variety of software and hardware system communities to collaborate and engage technically. It has been ahead of the curve in technologies such as RISC and VLIW processors, small and large-scale multiprocessors, clusters and networks-of-workstations, optimizing compilers, RAID, and network-storage system designs.

Today, as we enter the post-Moore era, ASPLOS 2022 re-establishes itself as the premier platform for cross-layer design and optimization to address fundamental challenges in computer system design in the coming years.

We look forward to welcoming you at the SwissTech Center, EPFL.

Official Website

Read More
© 2022 Cyber-Defence Campus

Heterogeneous computing creates new electrical-level vulnerabilities

Under the initiative of the armasuisse – Cyber-Defence Campus, a team of EPFL scientists, including CYD Doctoral Fellow Dina Mahmoud of PARSA, recently presented the first proof-of-concept for undervolting-based fault injection from the programmable logic of a field programmable gate array (FPGA) to the software executing on a processing system in the same system-on-chip (SoC). The team also proposes a number of future research directions, which, if addressed, should help to ensure the security of today’s heterogeneous computing systems.

Most Cyberattacks such as ransomware exploit vulnerabilities in software. While often neglected, hardware-based attacks can be just as powerful, on top of being more difficult to patch, as the underlying vulnerability remains in the deployed hardware. Hardware attacks in which adversaries have physical access to their target devices have long been investigated. However, with the world wide web and the possibility to access computing resources remotely in the cloud, remotely-controlled hardware attacks have become a reality. Examples of remote attacks include fault-injection attacks causing computation or data manipulation errors and side-channel attacks extracting secrets from power or electromagnetic side channels.

With Moore’s law losing pace in recent years, customizable hardware combining various types of processing units together in one heterogeneous system has become a global trend to increase performance. Since heterogeneous computing is a relatively recent phenomenon, not all security vulnerabilities have been fully understood or investigated. To better understand the landscape of cybersecurity in relation to heterogeneous systems, we surveyed state-of-the-art research on electrical-level attacks and defenses. We focused on exploits which leverage vulnerabilities caused by the electrical signals or their coupling. For example, demanding more power than the power supply can provide, results in lowered voltage for the entire system; the undervolting can affect the functioning of the circuits (e.g., in a computer) and cause faults. Or, an adversary can monitor minute variations in the voltage waveform and use them to classify or even fully uncover the operations executed by the victim. Our survey, which will appear in ACM Computing Surveys, addresses the electrical-level attacks on central processing units (CPUs), field-programmable gate arrays (FPGAs), and graphics processing units (GPUs), the three processing units frequently combined in heterogeneous platforms. We discuss whether electrical-level attacks targeting only one processing unit can extend to the heterogeneous system as a whole and highlight open research directions necessary for ensuring the security of these systems in the future.

In the survey, we discuss a number of system-level vulnerabilities which have not been investigated yet. One of the open research questions we highlight is the possibility of inter-component fault-injection attacks in our subsequent work, which will be presented in March at the Design, Automation and Test in Europe conference (DATE 2022), we demonstrate the feasibility of such an attack. We show the first undervolting attack in which circuits, implemented using the FPGA programmable logic, act as an aggressor while the CPU, residing on the same system-on-chip, is the victim. We program the FPGA to deploy malicious hardware circuits in order to force the FPGA to draw considerable current and cause a drop in the power supply voltage. Since the power supply is shared, the obtained voltage drop propagates across the entire chip. As a result, the computation performed by the CPU faults. If exploit2ed in a remote setting, this attack can lead to denial-of-service or data breach. With these findings, we further confirm the need for continuing research on the security of heterogeneous systems in order to prevent such attacks.


FundingThe CYD Fellowships are supported by armasuisse Science and Technology.

ReferencesMahmoud, Dina Gamaleldin Ahmed Shawky; Hussein, Samah; Lenders, Vincent; Stojilovic, Mirjana: FPGA-to-CPU Undervolting Attacks. 25th Design, Automation and Test in Europe – DATE 2022 , Antwerp, Belgium [Virtual], March 14-23, 2022:

Mahmoud, Dina G.; Lenders, Vincent; Stojilović, Mirjana: Electrical-Level Attacks on CPUs, FPGAs, and GPUs: Survey and Implications in the Heterogeneous Era. ACM Computing Surveys, Volume 55, Issue 3, April 2022, Article No.: 58. DOI: 10.1145/3498337

Read More

Intel funds EcoCloud Midgard-based research

An exciting new development in the progress of Midgard, a novel re-envisioning of the virtual memory abstraction ubiquitous to computer systems, sees a tech leader funding research that will bring together experts from Yale, the University of Edinburgh and EcoCloud at EPFL.

Global semiconductor manufacturer Intel is sponsoring an EcoCloud-led project entitled “Virtual Memory for Post-Moore Servers”, which is part of its wider research into improving power performance and total cost of ownership for servers in big-scale datacenters.

What is the Post-Moore era?

Moore’s law was conceived by Gordon Moore in 1965. He would later become CEO of Intel. Moore predicted that the number of transistors in a dense integrated circuit would double, roughly every two years. This has been remarkably accurate up to now, but we are reaching the stage where physical limitations will curtail this pattern within the next couple of years: we are approaching the “Post-Moore era”. Many observers are optimistic about the continuation of technological progress in a variety of other areas, including new chip architectures, quantum computing and artificial intelligence.

Midgard is a radical new technology which helps provide optimizations for memory in data centers with an innovative, highly efficient namespace for access and protection control. Efficient memory protection is a foundation for virtualization, confidential computing, use of accelerators, and emerging computing paradigms such as serverless computing.

The Midgard layer is an intermediate stratum that renders possible staggering gains in performance for data servers as memory grows, and which is compatible with modern operating systems such as Linux, Android, macOS and Windows.

The project Virtual Memory for Post-Moore Servers aims to disrupt traditional server technology, targeting full-stack evaluation and hardware/software co-design, based on Midgard’s radical approach to virtual memory.

Midgard is a consortium of the following principal investigators at EcoCloud, University of Edinburgh and Yale:
David Atienza, Abhishek Bhattacharjee, Babak Falsafi, Boris Grot and Mathias Payer.

Link to the Midgard Website


Read More

Compusapien: More computing, less energy

© cherezoff / Adobe Stock

Today’s data centres have an efficiency problem – much of their energy is used not to process data, but to keep the servers cool. A new server architecture under development by the EU-funded COMPUSAPIEN project could solve this.

As the digital revolution continues to accelerate, so too does our demand for more computing power. Unfortunately, current semiconductor technology is energy-inefficient, meaning so too are the servers and cloud technologies that depend on them. In fact, as much as 40 % of a server’s energy is used just to keep it cool. “This problem is aggravated by the fact that the complex design of the modern server results in a high operating temperature,” says David Atienza Alonso, who heads the Embedded Systems Laboratory (ESL) at the Swiss Federal Institute of Technology Lausanne (EPFL). “As a result, servers cannot be operated at their full potential without the risk of overheating and system failures.” To tackle this problem, the EU has issued several policies addressing the increasing energy consumption of data centres, including the JRC EU Code for Data Centres. According to Atienza Alonso, meeting the goals of these policies requires an overhaul of computing server architecture and the metrics used to measure their efficiency – which is exactly what the COMPUSAPIEN (Computing Server Architecture with Joint Power and Cooling Integration at the Nanoscale) project aims to do. “The project intends to completely revise the current computing server architecture to drastically improve its energy efficiency and that of the data centres it serves,” explains Atienza Alonso, who serves as the project’s principal investigator.

Cooling conundrum

At the heart of the project, which is supported by the European Research Council, is a disruptive, 3D architecture that can overcome the worst-case power and cooling issues that have plagued servers. What makes this design so unique is its use of a heterogeneous, many-core architecture template with an integrated on-chip microfluidic fuel cell network, which allows the server to simultaneously provide both cooling and power. According to Atienza Alonso, this design represents the ultimate solution to the server cooling conundrum. “This integrated, 3D cooling approach, which uses tiny microfluidic channels to both cool servers and convert heat into electricity, has proved to be very effective,” he says. “This guarantees that 3D many-core server chips built with the latest nanometre-scale process technologies will not overheat and stop working.”

A greener cloud

Atienza Alonso estimates that the new 3D heterogeneous computing architecture template, which recycles the energy spent in cooling with the integrated micro-fluidic cell array (FCA) channels, could recover 30-40 % of the energy typically consumed by data centres. With more gains expected when the FCA technology is improved in the future, the energy consumption (and environmental impact) of a data centre will be drastically reduced, with more computing being done using the same amount of energy. “Thanks to integration of new optimised computing architectures and accelerators, the next generation of workloads on the cloud (e.g. deep learning) can be executed much more efficiently,” adds Atienza Alonso. “As a result, servers in data centres can serve many more applications using much less energy, thus dramatically reducing the carbon footprint of the IT and cloud computing sector.”



Read More

EcoCloud organizes Cloud Sustainability Days 2021

The Cloud Sustainability Days, organised by the EPFL EcoCloud Center, are beginning.



Please use the basement entrance:

This conference is organised by the EPFL EcoCloud Center for sustainable cloud technologies, with the participation of the Swiss Datacenter Efficiency Association, the EPFL FUSTIC association, members of EPFL Faculty, and industrial partners.

The program will cover sustainability in cloud infrastructure with focus on IT (servers, storage, network) and DC (electricity, cooling, heat recycling) infrastructure efficiency and emissions.

Attending the events will be representatives from IBM, Hewlett Packard Enterprise (HPE), Microsoft and the EPFL Circular Economy Initiative.

The EcoCloud Sustainability Days will take place in the SwissTech Convention Center, which is a conference center on the campus of the EPFL. This a free conference for members of the EPFL campus community, CHF25 for the general public. Registration is mandatory, since the maximum allowed attendance is 150 people.

We look forward to seeing you here!


Cloud Sustainability Days 2021
SwissTech Convention Center,

Read More

A paradigm shift in virtual memory use: Midgard

Researchers at Ecocloud, the EPFL Center for Sustainable Cloud Computing, have pioneered an innovative approach to implementing virtual memory in data centers, which will greatly increase server efficiency.

Virtual Memory has always been a pillar for memory isolation, protection and security in digital platforms. The use of virtual memory is non-negotiable, even in widely-used hardware accelerators like GPU, NICs, FPGAs and secure CPU architectures. It is therefore vital that silicon should be used as frugally as possible.

As services host more data in server memory for faster access, the traditional virtual memory technologies that look up data in server memory and check for protection have emerged as a bottleneck. Modern graph analytics workloads (e.g., on social media) spend over 20% of their time in virtual memory translation and protection checks. Server virtualization for cloud computing, to help increase utilization of infrastructure and return on investment in data centers, dramatically exacerbates this problem by requiring lookups and protection checks across multiple layers of guest (customer) and host (cloud provider) software.

The way in which virtual memory is assigned in these servers is critical because, with such huge quantities of data involved, changes in strategy can have a massive effect on server efficiency and data security.

“Virtual memory technology has been around since the 1960’s, and lays the foundation for memory protection and security in modern digital platforms,” write the authors of “Rebooting Virtual Memory with Midgard”, a paper they will present next month at ISCA’21, the flagship conference in computer architecture.

Memory has become the most precious silicon product in data centers in recent years, as more services are brought online. Virtual memory traditionally divides up the physical storage into fixed size units, for optimal capacity management. This division slows down lookups and protection checks as memory capacity increases, because large regions of memory in application software (e.g., GBs) is disintegrated into millions of pages (e.g., KB). Modern chips (e.g., the recently announced Apple M1) employ thousands of table entries per processor to do lookups and perform protection checks for each memory access.

Namespaces are used to store unique references for data, in structured hierarchies. Removing some of this hierarchy and reducing the number of translations would represent a net gain in efficiency. The authors propose Midgard, which introduces a namespace for data lookup and memory protection checks in the memory system without making any modifications to the application software or the programming interface in modern platforms (e.g., Linux, Android, macOS/iOS).

With Midgard, data lookups and protection checks are done directly in the Midgard namespace in on-chip memory, and a translation to fixed size pages is only needed for access to physical memory. Unlike traditional virtual memory whose overhead grows with memory capacity, Midgard future-proofs virtual memory as the overhead of translation and protection check to physical memory decreases with growing on-chip memory capacity in future products filtering traffic to physical memory.

Analytic and empirical results described in the paper show a remarkable performance from Midgard when compared to traditional technology, or even rival new technologies (e.g., the larger fixed size pages used in certain applications). At low loads the Midgard system was 5% behind standard performance, but with loads of 256 MB aggregate large cache it can match and even outperform traditional systems in terms of virtual memory overheads.

Figure 1: The Average Memory Access Time for address translations on low-memory, high-memory and Midgard systems.

The authors conclude: “This paper is the first of several steps needed to demonstrate a fully working system with Midgard. We focused on a proof-of-concept software-modelled prototype of key architectural components. Future work will address the wide spectrum of topics needed to realize Midgard in real systems.”

Rebooting Virtual Memory with Midgard
S. Gupta; A. Bhattacharyya; Y. Oh; A. Bhattacharjee; B. Falsafi and M. Payer
ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021.

Midgard etymology: a middle realm between heaven (Asgard) and hell (Helheim)

Read More

Google Scholarship for EcoCloud researcher Simla Burcu Harma

Simla Burcu Harma has received the Generation Google Scholarship for Women in Computer Science, one of only two students in Switzerland to receive this award.Generation Google Scholarship for Women in Computer Science is awarded every year to a selection of 20 PhD students across Europe, Africa and the Middle East who have demonstrated a passion for technology, academic excellence, and have proven themselves as exceptional leaders and role models.

Simla comes from Turkey, and is pursuing a PhD at the Parallel Systems Architecture Lab (PARSA), under the supervision of Prof. Babak Falsafi. Her research interests lie in the area of Systems for Machine Learning. She is working on the EcoCloud project ColTrain, which aims to restore datacenter homogeneity and co-locate training and inference without compromising inference efficiency or quality of  service (QoS) guarantees.

Her work has included contributions to the ColTrain HBFPEmulator:

Read More

ColTraIn Releases Open-source HBFP Training Emulator

DNN training and inference have similar basic operators but with fundamentally different requirements. The former is throughput bound and relies on high precision floating-point arithmetic for convergence while the latter is latency-bound and tolerant to low-precision arithmetic. Both workloads require high computational capabilities and can benefit from hardware accelerators. The disparity in resource requirements forces datacenter operators to choose between custom accelerators for training and inference or training accelerators for inference.

However, neither of these two options is an optimum solution. While the former results in datacenter heterogeneity and higher management costs, the latter results in inefficient inference. Moreover, dedicated inference accelerators face load fluctuations, leading to overprovisioning and low average utilization.

The objective of EPFL’s ColTraIn: Co-located DNN Training and Inference team of PARSA and MLO is to restore datacenter homogeneity and co-locate training and inference without compromising inference efficiency or quality of service (QoS) guarantees. ColTraIn aims to overcome two key challenges: (1) the difference in the arithmetic representation used in workloads, and (2) the scheduling of training tasks in inference-bound accelerators. The recent release of HBFP (Hybrid Block Floating Point) meets the first challenge.

HBFP trains DNNs with dense, fixed-point-like arithmetic for most operations without sacrificing accuracy, thus facilitating effective co-location. More specifically, HBFP offers the accuracy of 32-bit floating-point with the numeric and silicon density of 8-bit fixed-point for many models (ResNet, WideResNet, DenseNet, AlexNet, LSTM, and BERT).

The open-source project repository is available for ongoing research on training DNNs with HBFP.

The ColTraIn team is working to address the second challenge of developing a co-locating accelerator. The design adds training capabilities to an inference accelerator and pairs it with a scheduler that takes both resource utilization and tasks’ QoS constraints into account to co-locate DNN training and inference.

Read More