CLUEstering: a high-performance density-based clustering library for scientific computing

Balducci, Simone (2024) CLUEstering: a high-performance density-based clustering library for scientific computing. [Laurea magistrale], Università di Bologna, Corso di Studio in Physics [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (9MB)

Abstract

Clustering is a computational technique that aims at classifying objects based on their similarity, and is widely used in many branches of science nowadays, for instance in image segmentation, medical imaging, study of complex systems, machine learning techniques and high-energy physics. As the amount of data collected in every field of research increases, techniques like clustering will have to deal with an increasing amount of data, which will keep increasing faster than the rate at which the hardware is evolving. This requires to find new ways to handle this data as efficiently as possible. In the last decades, parallel processors like GPUs and FPGA have risen in popularity, thanks to their ability to perform complex calculations very efficiently by executing a large number of operations in parallel. The purpose of this thesis is to develop a general-purpose clustering library based on the CLUE algorithm, a highly parallel density-based clustering algorithm used for the local reconstruction of hits in the high-granularity calorimeters of the CMS detector at CERN. CLUEstering is developed using the Alpaka library, a C++ performance portability library that allows to write code that runs on many types of modern processors with near-native efficiency and without any code duplication. The library is developed with a Python interface to the C++ backend, in order to make it easier to use and appeal to a wider range of users. In the end the library was tested on selected datasets in order to assess the quality of its reconstruction and benchmark its performance. Also, to show its generality it was applied to two modern problems from two separate areas of science: vertex reconstruction in high-energy physics and stars detection from PSF images in astronomy.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Balducci, Simone
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
Clustering,Parallel computing,Heterogeneous GPU computing,High-energy Physics,Astronomy,Machine learning
Data di discussione della Tesi
20 Settembre 2024
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^