Voice conversion with pre-trained representations for audio anonymization

Costante, Marco (2023) Voice conversion with pre-trained representations for audio anonymization.
The recording and processing of voice data raises increasing privacy concerns for users and service providers. One way to address these issues is to move processing on the edge device closer to the recording so that potentially identifiable information is not transmitted over the internet. However, this is often not possible due to hardware limitations. An interesting alternative is the development of voice anonymization techniques that remove individual speakers characteristics while preserving linguistic and acoustic information in the data. In this work, a state-of-the-art approach to sequence-to-sequence speech conversion, ini- tially based on x-vectors and bottleneck features for automatic speech recognition, is explored to disentangle the two acoustic information using different pre-trained speech and speakers representation. Furthermore, different strategies for selecting target speech representations are analyzed. Results on public datasets in terms of equal error rate and word error rate show that good privacy is achieved with limited impact on converted speech quality relative to the original method.

Tesi di laurea (Laurea magistrale)
Costante, Marco
speaker anonymization,voice conversion,privacy preserving,pre-trained unsupervised models,speech representations
3 Febbraio 2023

