Aloi, Pietro
(2024)
Classification of Newborn's Cry Melody via Deep Learning Methods.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Biomedical engineering [LM-DM270] - Cesena, Documento ad accesso riservato.
Documenti full-text disponibili:
Abstract
Crying is the first form of communication for newborns, requiring the coordinated effort of several systems. Acoustic cry analysis and artificial intelligence-based methods can identify early biomarkers for diseases or neurodevelopmental disorders, using cry fundamental frequency (F0) and its changes over time (i.e., melodic shapes). This thesis aims to develop and test deep learning algorithms to classify – for the first time – five basic melodic shapes of newborns’ cries using spectrograms and F0 time-series from real and synthetic datasets obtained from 28 full-term, healthy newborns. An augmented dataset was created by combining the two. These datasets were used to train several convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to identify the best architecture and the most informative features for classifying the five melodic shapes. Additionally, the best architectures were integrated into a multi-stream neural network to determine if performance could be enhanced by combining different types of information.
The results showed that using synthetic data combined with real data and leveraging the augmented dataset significantly enhances classification accuracies by up to 76.7% using F0 time-series analysis to train a customized single-stream RNN architecture, combined with a gated recurrent unit (GRU). In contrast, the Visual Geometry Group (VGG) -16 model (single-stream CNN, consisting of 13 convolutional layers and 3 fully connected layers) is the most effective for analyzing spectrogram images (67.9% accuracy rate). Analyzing the F0 waveform and the spectrogram and using the multi-stream neural network accuracy is enhanced up to 78.3%.
Future studies should aim to improve classification performance by acquiring larger datasets, possibly including individuals at risk of neurodevelopmental disorders, to explore the potential for integrating these tools into routine clinical practice.
Abstract
Crying is the first form of communication for newborns, requiring the coordinated effort of several systems. Acoustic cry analysis and artificial intelligence-based methods can identify early biomarkers for diseases or neurodevelopmental disorders, using cry fundamental frequency (F0) and its changes over time (i.e., melodic shapes). This thesis aims to develop and test deep learning algorithms to classify – for the first time – five basic melodic shapes of newborns’ cries using spectrograms and F0 time-series from real and synthetic datasets obtained from 28 full-term, healthy newborns. An augmented dataset was created by combining the two. These datasets were used to train several convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to identify the best architecture and the most informative features for classifying the five melodic shapes. Additionally, the best architectures were integrated into a multi-stream neural network to determine if performance could be enhanced by combining different types of information.
The results showed that using synthetic data combined with real data and leveraging the augmented dataset significantly enhances classification accuracies by up to 76.7% using F0 time-series analysis to train a customized single-stream RNN architecture, combined with a gated recurrent unit (GRU). In contrast, the Visual Geometry Group (VGG) -16 model (single-stream CNN, consisting of 13 convolutional layers and 3 fully connected layers) is the most effective for analyzing spectrogram images (67.9% accuracy rate). Analyzing the F0 waveform and the spectrogram and using the multi-stream neural network accuracy is enhanced up to 78.3%.
Future studies should aim to improve classification performance by acquiring larger datasets, possibly including individuals at risk of neurodevelopmental disorders, to explore the potential for integrating these tools into routine clinical practice.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Aloi, Pietro
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM BIOMEDICAL ENGINEERING FOR NEUROSCIENCE
Ordinamento Cds
DM270
Parole chiave
Infant,Cry,Melody,Classification,Deep,Learning,Convolutional, Neural,Networks,Recurrent,Multi-stream.
Data di discussione della Tesi
13 Giugno 2024
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Aloi, Pietro
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM BIOMEDICAL ENGINEERING FOR NEUROSCIENCE
Ordinamento Cds
DM270
Parole chiave
Infant,Cry,Melody,Classification,Deep,Learning,Convolutional, Neural,Networks,Recurrent,Multi-stream.
Data di discussione della Tesi
13 Giugno 2024
URI
Gestione del documento: