Resource Management of HPC Infrastructures based on Kubernetes

Tagliani, Michele (2025) Resource Management of HPC Infrastructures based on Kubernetes. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria informatica [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)
Download (1MB)

Abstract

With the increasing adoption of AI applications, researchers require different types of computing resources based on their workload. High Performance Computing (HPC) infrastructure is typically favoured for tasks like model training and classic computational applications. In contrast, Cloud environments, particularly those built on Kubernetes, are preferred for data processing tasks, inference services, and databases. Currently, HPC and Cloud clusters often operate on separated infrastructures and utilize distinct cluster management tools. This segregation poses several problems for administrators, like increased operational burden due to managing different systems with different tools and inefficient utilization of valuable resources, such as GPUs, which cannot be dynamically transferred between physically separated clusters, limiting scalability and leading to underutilization. In this thesis, we propose a new method that converges the management of HPC and Cloud environments at the node level, targeting Slurm and Kubernetes as workload managers of choice. To achieve our goal, we extend the Kubernetes management tool Cluster API (CAPI) with support of Virtual Kubelets, enabling the provisioning and bootstrapping of Slurm clusters. Our solution highlights the benefits of adopting Cluster API as a unifying interface to set up and scale Kubernetes and Slurm clusters while ensuring dedicated access to assigned computing resources, therefore reducing the risk of contention.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Tagliani, Michele

Relatore della tesi

Bellavista, Paolo

Scuola

Ingegneria e Architettura

Corso di studio

Ingegneria informatica [LM-DM270]

Indirizzo