Facebook and EPFL have initiated a collaborative program that aims to carry out seminal research with common meeting points for both organizations. Facebook seeks to leverage EPFL’s proven expertise in Computer Science and Engineering to enable the flow of technology from one of the most renowned research institutions to the leading American social media conglomerate. The collaboration will also help the latter strengthen its position in Switzerland and gain access to some of the best academic minds in Europe.

The following projects have already been lined up for the collaborative Full-System Accelerated and Secure ML Collaborative Research program:

Training for Recommendation Models on Heterogeneous Servers
Distributed Transformer Benchmarks
Full-System API Inference to Enforce Security
Communication Stacks for µServices in Datacenters

Each of these projects will be conducted by a member of the expert team from EPFL. The team includes David Atienza, Babak Falsafi, Martin Jaggi, and Mathias Payer. Babak Falsafi will be the point of contact for the engagement.

Training for Recommendation Models on Heterogeneous Servers

This project aims to develop strategies to automatically select the best accelerator to run a specific DNN training. The research by David Atienza and team will develop the necessary software libraries to allocate workload efficiently by considering performance, power, and accuracy constraints. Meta-learning algorithms will be created to train DL models and configure their hyper-parameters in an automated way, outperforming current state-of-the- art approaches. This approach is expected to result in significant savings in the total training time and improved robustness against minimization for smaller memory size designs.

Distributed Transformer Benchmarks

MLBench, a framework for distributed machine learning, aims to perform the role of an easy-to-use and fair benchmarking suite for algorithms as well as for systems (software frameworks and hardware). It will provide re-usable and reliable reference implementations of distributed ML training algorithms. MLBench renders support to a wide range of platforms, ML frameworks, and machine learning tasks. Its goal is to benchmark all/most currently relevant distributed execution frameworks. Lead researcher Martin Jaggi and team will soon release the first results and reference code for distributed training (starting with Cifar10 and ImageNet, in both PyTorch and TensorFlow).

Full-System API Inference to Enforce Security

Mathias Payer and team aim to build an API flow graph (AFG) that encodes all valid API interactions and their parameters. The proposed algorithm will build the global AFG by analyzing all uses of a function on the system’s source code. The researchers will leverage test projects that provide a large corpus of test cases and input files for a wide variety of programs. The data set will help infer API usage by monitoring the state construction through the provided seeds and examples.

Communication Stacks for µServices in Datacenters

In this study, Babak Falsafi and others will investigate technologies to support communication in microservices. The research is an extension of their prior work on tighter integration of network with memory with support for memory pooling and RPC scheduling. It aims to tackle the software bottleneck in communication for microservices and address challenges such as memory scalability for RPC, software stacks for high fan-out RPC processing, higher-level object access semantics via RPC to avoid multiple roundtrips, and support for data transformation across diverse language and software ecosystem boundaries. The researchers will investigate codesigned RPC technologies with hardware terminating protocols that enable serving packets directly out of CPU’s SRAM to eliminate DRAM capacity and bandwidth provisioning and enable a new class of RPC substrate that is inherently technology-scalable. They propose to investigate optimizations for data transformation for common case data formats running conventional CPU’s. They will delve into the integration of data transformation into an optimized RPC stack (from above) to identify opportunities for data placement, reduction in data movement and buffering on commodity hardware. Technologies for hardware/software co-design of data transformers will also be within the scope of the work.

The Facebook-EPFL collaborative engagement has been approved for funding for an initial period of one year, with an expected renewal each year for at least three years. Each project includes a grant of CHF 200,000 per year, which will be used to financially support one student.

For more details of the individual projects, visit:

training-for-recommendation-models-on-heterogeneous-servers

MLBench

full-system-api-inference-to-enforce-security

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Upcoming Events

Future Health: Harnessing Multimodal Data and GenAI for Health Promotion

Swiss Federal Offices Day 2024

Annual Event

Facebook-EPFL Joint ML Research Engagement

Previous Post

Datashare Network: A Decentralized Search Engine for Journalists