Voice conversion with pre-trained representations for audio anonymization

Costante, Marco (2023) Voice conversion with pre-trained representations for audio anonymization. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)


The recording and processing of voice data raises increasing privacy concerns for users and service providers. One way to address these issues is to move processing on the edge device closer to the recording so that potentially identifiable information is not transmitted over the internet. However, this is often not possible due to hardware limitations. An interesting alternative is the development of voice anonymization techniques that remove individual speakers characteristics while preserving linguistic and acoustic information in the data. In this work, a state-of-the-art approach to sequence-to-sequence speech conversion, ini- tially based on x-vectors and bottleneck features for automatic speech recognition, is explored to disentangle the two acoustic information using different pre-trained speech and speakers representation. Furthermore, different strategies for selecting target speech representations are analyzed. Results on public datasets in terms of equal error rate and word error rate show that good privacy is achieved with limited impact on converted speech quality relative to the original method.

Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Costante, Marco
Relatore della tesi
Correlatore della tesi
Corso di studio
Ordinamento Cds
Parole chiave
speaker anonymization,voice conversion,privacy preserving,pre-trained unsupervised models,speech representations
Data di discussione della Tesi
3 Febbraio 2023

Altri metadati

Gestione del documento: Visualizza il documento