Voice conversion with pre-trained representations for audio anonymization

Costante, Marco (2023) Voice conversion with pre-trained representations for audio anonymization. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

The recording and processing of voice data raises increasing privacy concerns for users and service providers. One way to address these issues is to move processing on the edge device closer to the recording so that potentially identifiable information is not transmitted over the internet. However, this is often not possible due to hardware limitations. An interesting alternative is the development of voice anonymization techniques that remove individual speakers characteristics while preserving linguistic and acoustic information in the data. In this work, a state-of-the-art approach to sequence-to-sequence speech conversion, ini- tially based on x-vectors and bottleneck features for automatic speech recognition, is explored to disentangle the two acoustic information using different pre-trained speech and speakers representation. Furthermore, different strategies for selecting target speech representations are analyzed. Results on public datasets in terms of equal error rate and word error rate show that good privacy is achieved with limited impact on converted speech quality relative to the original method.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Costante, Marco

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Matassoni, Marco ; Brutti, Alessio

Scuola

Ingegneria e Architettura

Corso di studio