Using Reinforcement Learning


Team

  Atienza Alonso David


 

The advent of massively parallel and heterogeneous architectures has necessitated the co-location of applications in order to exploit the potential underlying performance, usually under tight system-level or application-level limits. Such systems require holistic and autonomous resource management schemes. In such scenarios, Artificial Intelligence (AI) techniques can be of great help. In fact, the development of agents that optimally learn and improve their behavior in an autonomous fashion has traditionally been one of the most important goals of AI. With their responsiveness and self-adaptation to the environment, AI systems can sense, interact, and react to environmental changes without human intervention.

In our line of research, we integrate a field of AI called Reinforcement Learning (RL), which learns by interaction, into a self-adaptive resource manager to tackle the problem of automatic and dynamic application- and system-wide knob adaptation for multi-user video transcoding scenarios on modern multi-core servers. Our study shows that RL is an effective and efficient technique to automatically extract and apply policies that simultaneously fulfill performance, quality, and power restrictions when targeting resource management on multiple application instances. In our proposal, each agent independently explores a particular subspace to attain sufficient knowledge about the environment faster. Further, each agent exploits its internal knowledge jointly with others’ knowledge in a cooperative manner to optimally behave in the environment.

We used the specific use case of multi-user video transcoding via a highly tuned HEVC encoder modified to expose dynamic application-level knobs. The benefits of our approach are revealed in terms of adaptability and quality (up to 4x improvements in terms of QoS when compared to a static scheme), and learning time (6x faster than an equivalent mono-agent implementation). Our power-capping techniques outperform the hardware-based power capping with respect to quality.

The management of dynamic application- and system-level knobs in a holistic fashion can be extended with further parameters or output metrics, and to other applications, both in the multimedia area and in other fields. Besides, the architectural-related techniques applied to deal with system knobs are of wide appeal to be applied (isolated or in conjunction) to other present and future architectures.

Suggested Reading

https://infoscience.epfl.ch/record/278279/files/2019_MAL_Luis_Journal%281%29.pdf