Efficient distributed learning solutions, taking into account adversarial behavior in both worker-server and peer-to-peer architectures


Team

  Guerraoui Rachid


The robustness of distributed machine learning has been amply demonstrated in various worker-server and peer-to-peer implementations. It ranges across a wide gamut of applications including image classification, prediction of financial trends, diagnosing diseases, autonomous driving, and gaming. Such scenarios, however, assume perfectly functioning workers. In reality, some workers might exhibit arbitrary failures, or Byzantine behavior. This has become even more pronounced with the rapid growth in data and increasing complexity of models, both of which call for significantly high computational resources. In order to be effective, a distributed learning solution must be capable of tolerating the expected adversarial behavior of some of its workers or peers.

Although existing approaches, such as state machine replication protocol, target fault tolerance, they are unable to assure efficiency. To circumvent the problem, EPFL’s Rachid Guerraoui has undertaken a project that seeks to develop efficient distributed learning solutions, taking into account adversarial behavior in both worker-server and peer-to-peer architectures.

The project is based on several recent studies by Rachid Guerraoui, which have not only yielded groundbreaking results but also encouraged further studies on the problem. Last year, Prof. Guerraoui collaborated with other researchers to study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD) and proposed Krum, an update rule to guarantee convergence despite Byzantine workers. In another paper published earlier this year, Rachid Guerraoui and coauthors showed the inadequacies of existing Byzantine-resilient schemes and introduced Bulyan, a more effective solution that achieves convergence without being susceptible to existing aggregation rules.

Prof. Guerraoui is Full Professor at EPFL’s School of Computer and Communication Sciences. He has a strong body of work on distributed algorithms and distributed programming languages. He has worked extensively on secure distributed storage, transactional shared memory and distributed programming languages.

 

https://arxiv.org/pdf/1802.07927.pdf
https://arxiv.org/pdf/1703.02757.pdf
https://pdfs.semanticscholar.org/1a92/3d5d9c4a020252cbbb2e4829670b83d76502.pdf