Sani, Lorenzo
(2022)
Unsupervised clustering of MDS data using federated learning.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Physics [LM-DM270]
Documenti full-text disponibili:
|
Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (3MB)
|
Abstract
In this master thesis we developed a model for unsupervised clustering on a data set of biomedical data. This data has been collected by GenoMed4All consortium from patients affected by Myelodysplastic Syndrome (MDS), that is an haematological disease. The main focus is put on the genetic mutations collected that are used as features of the patients in order to cluster them. Clustering approaches have been used in several studies concerning haematological diseases such MDS. A neural network-based model was used to solve the task. The results of the clustering have been compared with labels from a "gold standard'' technique, i.e. hierarchical Dirichlet processes (HDP). Our model was designed to be also implemented in the context of federated learning (FL). This innovative technique is able to achieve machine learning objective without the necessity of collecting all the data in one single center, allowing strict privacy policies to be respected. Federated learning was used because of its properties, and because of the sensitivity of data. Several recent studies regarding clinical problems addressed with machine learning endorse the development of federated learning settings in such context, because its privacy preserving properties could represent a cornerstone for applying machine learning techniques to medical data. In this work will be then discussed the clustering performance of the model, and also its generative capabilities.
Abstract
In this master thesis we developed a model for unsupervised clustering on a data set of biomedical data. This data has been collected by GenoMed4All consortium from patients affected by Myelodysplastic Syndrome (MDS), that is an haematological disease. The main focus is put on the genetic mutations collected that are used as features of the patients in order to cluster them. Clustering approaches have been used in several studies concerning haematological diseases such MDS. A neural network-based model was used to solve the task. The results of the clustering have been compared with labels from a "gold standard'' technique, i.e. hierarchical Dirichlet processes (HDP). Our model was designed to be also implemented in the context of federated learning (FL). This innovative technique is able to achieve machine learning objective without the necessity of collecting all the data in one single center, allowing strict privacy policies to be respected. Federated learning was used because of its properties, and because of the sensitivity of data. Several recent studies regarding clinical problems addressed with machine learning endorse the development of federated learning settings in such context, because its privacy preserving properties could represent a cornerstone for applying machine learning techniques to medical data. In this work will be then discussed the clustering performance of the model, and also its generative capabilities.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Sani, Lorenzo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
clustering,unsupervised machine learning,federated learning,clinical data,classification,neural network
Data di discussione della Tesi
25 Marzo 2022
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Sani, Lorenzo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
clustering,unsupervised machine learning,federated learning,clinical data,classification,neural network
Data di discussione della Tesi
25 Marzo 2022
URI
Statistica sui download
Gestione del documento: