Algorithms that combine three key aspects: accelerated, non-convex, and distributed training


  Jaggi Martin


The relationship between optimization and machine learning is one of the most significant aspects of modern computational science. To obtain essential information from voluminous data, it is vital that new algorithms are designed with optimization formulations and methods. This is particularly relevant because of the growing intricacy, size, and diversity of machine learning models today. Therefore, it is necessary to reconsider existing theories and introduce efficient and scalable optimization algorithms that bring modularity and scalability to the training process. Toward this objective, EPFL’s Martin Jaggi and co-principal investigators have developed algorithms that combine three key aspects: accelerated, non-convex, and distributed training.

In proposing convergence acceleration techniques for solving generic optimization problems, the researchers adopt a statistical view on optimization methods. In typical optimization algorithms, the sequence of iterates is discarded and only its last point is used to estimate the optimum. However, Martin Jaggi and others use the iterates produced by the optimization algorithm and estimate the solution directly from this sequence. The key benefit of their scheme is minimal implementation cost with significant speedups and negligible complexity. It doesn’t necessitate any change to the existing neural network training code.

In the context of non-convex optimization problems, the investigators adopt a two-pronged approach: design algorithms that answer local convergence issues for non-convex problems with a focus on deep neural networks and matrix factorization.

The third focus area of the project is distribution optimization. To achieve the desired scalability, optimization algorithms need to transfer and manage information between distributed devices. However, existing distributed machine learning models are unable to achieve high returns due to the increasing number of devices. Conversely, the current research brings modularity and scalability to the training process by targeting extensions that efficiently use second-order information in the distributed setting. It demonstrates improved training speed for linear machine learning models by utilizing the memory hierarchy as well as different degrees of compute parallelism between two devices.

In recognition of their project proposal (“Large-Scale Optimization: Beyond Convexity Accelerated, non-convex & distributed optimization for machine learning”), Google has conferred the researchers with its Focused Research Award for 2018 in the area of Machine Learning.