Bovinelli, Riccardo
(2026)
Dynamic and Integrated HPC Cluster Provisioning with Cloud Native Technologies.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Ingegneria informatica [LM-DM270]
Documenti full-text disponibili:
Abstract
High Performance Computing (HPC) systems are traditionally deployed as clusters of dedicated machines managed by specialized workload schedulers. On the other hand, cloud native platforms based on container orchestration technologies provide automated deployment and dynamic infrastructure management. The increasing interest in integrating these two paradigms has motivated research on architectures that combine HPC workload managers with cloud-native orchestration environments, and several challenges remain open. This thesis investigates the feasibility and performance side effects of deploying an HPC cluster whose control plane runs inside a Slinky instrumented Kubernetes environment and worker nodes are dynamically provisioned as bare metal machines using the Kubernetes Cluster API. The work aims to determine whether such an architecture can manage HPC workloads while enabling automated deployment and dynamic scaling mechanisms. An experimental architecture was designed to enable the aforementioned environment, using the Metal3 infrastructure provider for Cluster API and an original solution, which is central to this thesis' work. It is called SliMe and addresses the needed technological integrations together with the enabling of dynamic provisioning. This architecture represents another step towards the convergence between traditional HPC systems and cloud computing. Performance was evaluated both for latency introduced by SliMe and through a set of benchmarks measuring computational performance, network bandwidth and MPI communication latency in the proposed architecture.
Abstract
High Performance Computing (HPC) systems are traditionally deployed as clusters of dedicated machines managed by specialized workload schedulers. On the other hand, cloud native platforms based on container orchestration technologies provide automated deployment and dynamic infrastructure management. The increasing interest in integrating these two paradigms has motivated research on architectures that combine HPC workload managers with cloud-native orchestration environments, and several challenges remain open. This thesis investigates the feasibility and performance side effects of deploying an HPC cluster whose control plane runs inside a Slinky instrumented Kubernetes environment and worker nodes are dynamically provisioned as bare metal machines using the Kubernetes Cluster API. The work aims to determine whether such an architecture can manage HPC workloads while enabling automated deployment and dynamic scaling mechanisms. An experimental architecture was designed to enable the aforementioned environment, using the Metal3 infrastructure provider for Cluster API and an original solution, which is central to this thesis' work. It is called SliMe and addresses the needed technological integrations together with the enabling of dynamic provisioning. This architecture represents another step towards the convergence between traditional HPC systems and cloud computing. Performance was evaluated both for latency introduced by SliMe and through a set of benchmarks measuring computational performance, network bandwidth and MPI communication latency in the proposed architecture.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Bovinelli, Riccardo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INGEGNERIA INFORMATICA
Ordinamento Cds
DM270
Parole chiave
HPC, Cloud Native Technologies, Converged Computing, Dynamic Provisioning, Kubernetes, Slinky, Slurm, Cluster API
Data di discussione della Tesi
26 Marzo 2026
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Bovinelli, Riccardo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INGEGNERIA INFORMATICA
Ordinamento Cds
DM270
Parole chiave
HPC, Cloud Native Technologies, Converged Computing, Dynamic Provisioning, Kubernetes, Slinky, Slurm, Cluster API
Data di discussione della Tesi
26 Marzo 2026
URI
Statistica sui download
Gestione del documento: