Dell'atti, Chiara
(2026)
Machine Learning Modelling of Exhaled Breath Mass Spectroscopy Data.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Biomedical engineering [LM-DM270] - Cesena, Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
This thesis study examines the potential of analysing human breath for diagnosis by combining mass spectrometry with supervised multivariate data analysis to detect respiratory diseases. Two distinct case studies are considered: Covid-19 infection detection and Lung Cancer discrimination. This is achieved using different instrumental and analytical pipelines that preserve the original data shape.
In the COVID-19 study, breath data are acquired using a column-free mass spectrometry system based on nanoporous membrane technology, which provides full mass spectra. Classification models are built directly on raw spectral data using minimal preprocessing pipelines to preserve fidelity to the instrument output and ensure real-time compatibility. Within this framework, both linear (PLS-DA) and non-linear (SVM-DA) models achieved excellent performance on an external test set, with an area under the curve (AUC) of 0.96 and 0.98, respectively. Model outputs are then interpreted using a Bayesian inference framework, incorporating disease prevalence as a prior probability, enabling a clinically realistic interpretation of classification results.
The Lung Cancer analysis instead relies on data acquired via conventional GC-MS, where spectral information is reduced to a set of preidentified VOCs and VOC ratios. Also in this case, a supervised non-linear machine learning model is adopted (SVM-DA). The models demonstrate a significant discriminative capability, with AUC values up to 0.93, while highlighting the intrinsic challenges posed by overlapping metabolic signatures across different pathological conditions.
This work demonstrates that methodological simplicity, minimal preprocessing, and strict adherence to experimental data can constitute a robust and realistic strategy for breath-based diagnostic applications.
Abstract
This thesis study examines the potential of analysing human breath for diagnosis by combining mass spectrometry with supervised multivariate data analysis to detect respiratory diseases. Two distinct case studies are considered: Covid-19 infection detection and Lung Cancer discrimination. This is achieved using different instrumental and analytical pipelines that preserve the original data shape.
In the COVID-19 study, breath data are acquired using a column-free mass spectrometry system based on nanoporous membrane technology, which provides full mass spectra. Classification models are built directly on raw spectral data using minimal preprocessing pipelines to preserve fidelity to the instrument output and ensure real-time compatibility. Within this framework, both linear (PLS-DA) and non-linear (SVM-DA) models achieved excellent performance on an external test set, with an area under the curve (AUC) of 0.96 and 0.98, respectively. Model outputs are then interpreted using a Bayesian inference framework, incorporating disease prevalence as a prior probability, enabling a clinically realistic interpretation of classification results.
The Lung Cancer analysis instead relies on data acquired via conventional GC-MS, where spectral information is reduced to a set of preidentified VOCs and VOC ratios. Also in this case, a supervised non-linear machine learning model is adopted (SVM-DA). The models demonstrate a significant discriminative capability, with AUC values up to 0.93, while highlighting the intrinsic challenges posed by overlapping metabolic signatures across different pathological conditions.
This work demonstrates that methodological simplicity, minimal preprocessing, and strict adherence to experimental data can constitute a robust and realistic strategy for breath-based diagnostic applications.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Dell'atti, Chiara
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INNOVATIVE TECHNOLOGIES IN DIAGNOSTICS AND THERAPY
Ordinamento Cds
DM270
Parole chiave
Mass,Spectroscopy,MS,GCMS,Multivariate,Data,Analysis,Machine, Learning,SVM-DA,PLS-DA,Covid19,Lung,Cancer
Data di discussione della Tesi
12 Marzo 2026
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Dell'atti, Chiara
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INNOVATIVE TECHNOLOGIES IN DIAGNOSTICS AND THERAPY
Ordinamento Cds
DM270
Parole chiave
Mass,Spectroscopy,MS,GCMS,Multivariate,Data,Analysis,Machine, Learning,SVM-DA,PLS-DA,Covid19,Lung,Cancer
Data di discussione della Tesi
12 Marzo 2026
URI
Gestione del documento: