Multi-Objective Machine-Learning Based Resource Management

Research Partners

Huawei CloudBUHuawei Cloud Business Unit

Sources of Funding

RECIPE H2020
MANGO H2020


Description

This research line focuses on multi-objective resource management of heterogeneous High Performance Computing (HPC) servers and datacenters through machine learning-based approaches.

Our research leverages system-level resource management techniques, such as  Dynamic Voltage and Frequency Scaling (DVFS), task scheduling and allocation, and thread migration, to simultaneously satisfy different design- and run-time objectives and constraints including power/energy consumption, temperature, performance, and Quality-of-Service.

Related Publications

Reinforcement Learning-Based Joint Reliability and Performance Optimization for Hybrid-Cache Computing Servers
Huang, Darong; Pahlevan, Ali; Costero, Luis; Zapater Sancho, Marina; Atienza Alonso, David
2022-03-07IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsPublication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)Publication funded by DeepHealth H2020 (Deep-Learning and HPC to Boost Biomedical Applications for Health)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
Resource Management for Power-Constrained HEVC Transcoding Using Reinforcement Learning
Costero, Luis; Iranfar, Arman; Zapater Sancho, Marina; D. Igual, Francisco; Olcoz, Katzalin; Atienza Alonso, David
2020IEEE Transactions on Parallel and Distributed SystemsPublication funded by Compusapien (Next-gen computing systems inspired by the human brain)Publication funded by DeepHealth H2020 (Deep-Learning and HPC to Boost Biomedical Applications for Health)Publication funded by  ()
A Machine Learning-Based Framework for Throughput Estimation of Time-Varying Applications in Multi-Core Servers
Iranfar, Arman; Silva De Souza, Wellington; Zapater Sancho, Marina; Olcoz, Katzalin; Xavier de Souza, Samuel; Atienza Alonso, David
2019Conference PaperPublication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
A Machine Learning-Based Strategy for Efficient Resource Management of Video Encoding on Heterogeneous MPSoCs
Iranfar, Arman; Simon, William Andrew; Zapater Sancho, Marina; Atienza Alonso, David
2018Conference PaperPublication funded by MANGO H2020 (Exploring Manycore Architectures for Next-GeneratiOn HPC systems)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
Machine Learning-Based Quality-Aware Power and Thermal Management of Multistream HEVC Encoding on Multicore Servers
Iranfar, Arman; Zapater Sancho, Marina; Atienza Alonso, David
2018Journal of IEEE Transactions on Parallel and Distributed Systems (TPDS)Publication funded by MANGO H2020 (Exploring Manycore Architectures for Next-GeneratiOn HPC systems)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)