Distributed Machine Learning Benchmark


  Jaggi Martin


There have been rapid advancements in machine learning services and systems, extending their application to a wide range of use cases. However, some basic questions have emerged from the perspectives of both users and providers of such services. How can we improve the efficiency, ease-of-use, transparency, and reproducibility of distributed machine learning methods and provide fair performance measures as well as reference implementations? The answer could lead to increased adoption of distributed machine learning methods in industry and among the academia. MLBench, a framework for distributed machine learning, can help achieve those goals.

The main objectives of MLbench are to:

  • Serve as an easy-to-use and fair benchmarking suite for algorithms as well as for systems (software frameworks and hardware).
  • Provide re-usable and reliable reference implementations of distributed ML training algorithms.

MLbench is based on Kubernetes to ease deployment in a distributed setting, both on public clouds and on dedicated hardware. It supports several standard machine-learning frameworks and algorithms, and can be set up with a single shell command. It comes with a convenient dashboard for easy access and management for running experiments, such as monitoring resource usage at all worker nodes. You can quickly set up the reference experiments or initiate your own, and get visualizations of your runs. By offering precise specifications of the benchmark ML tasks, metrics as well as reference implementations,

MLbench provides fair baselines and improves transparency. It can render support to a wide range of platforms, ML frameworks, and machine learning tasks. Our goal is to benchmark all/most currently relevant distributed execution frameworks. We welcome contributions of new algorithms and systems in the benchmark suite.

MLbench consists of a public website as well as 5 Github repositories:

Suggested Reading