Classification of Newborn's Cry Melody via Deep Learning Methods

Aloi, Pietro (2024) Classification of Newborn's Cry Melody via Deep Learning Methods. [Laurea magistrale], Università di Bologna, Corso di Studio in Biomedical engineering [LM-DM270] - Cesena, Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text non accessibile fino al 30 Giugno 2027.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)
Download (2MB) | Contatta l'autore

Abstract

Crying is the first form of communication for newborns, requiring the coordinated effort of several systems. Acoustic cry analysis and artificial intelligence-based methods can identify early biomarkers for diseases or neurodevelopmental disorders, using cry fundamental frequency (F0) and its changes over time (i.e., melodic shapes). This thesis aims to develop and test deep learning algorithms to classify – for the first time – five basic melodic shapes of newborns’ cries using spectrograms and F0 time-series from real and synthetic datasets obtained from 28 full-term, healthy newborns. An augmented dataset was created by combining the two. These datasets were used to train several convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to identify the best architecture and the most informative features for classifying the five melodic shapes. Additionally, the best architectures were integrated into a multi-stream neural network to determine if performance could be enhanced by combining different types of information. The results showed that using synthetic data combined with real data and leveraging the augmented dataset significantly enhances classification accuracies by up to 76.7% using F0 time-series analysis to train a customized single-stream RNN architecture, combined with a gated recurrent unit (GRU). In contrast, the Visual Geometry Group (VGG) -16 model (single-stream CNN, consisting of 13 convolutional layers and 3 fully connected layers) is the most effective for analyzing spectrogram images (67.9% accuracy rate). Analyzing the F0 waveform and the spectrogram and using the multi-stream neural network accuracy is enhanced up to 78.3%. Future studies should aim to improve classification performance by acquiring larger datasets, possibly including individuals at risk of neurodevelopmental disorders, to explore the potential for integrating these tools into routine clinical practice.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Aloi, Pietro

Relatore della tesi

Orlandi, Silvia

Correlatore della tesi

Bandini, Andrea ; Mellone, Sabato

Scuola

Ingegneria e Architettura

Corso di studio