Conditioning Text-to-Speech synthesis on dialect accent: a case study

Falai, Alessio (2022) Conditioning Text-to-Speech synthesis on dialect accent: a case study. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)

Download (14MB)

Abstract

Modern text-to-speech systems are modular in many different ways. In recent years, end-users gained the ability to control speech attributes such as degree of emotion, rhythm and timbre, along with other suprasegmental features. More ambitious objectives are related to modelling a combination of speakers and languages, e.g. to enable cross-speaker language transfer. Though, no prior work has been done on the more fine-grained analysis of regional accents. To fill this gap, in this thesis we present practical end-to-end solutions to synthesise speech while controlling within-country variations of the same language, and we do so for 6 different dialects of the British Isles. In particular, we first conduct an extensive study of the speaker verification field and tweak state-of-the-art embedding models to work with dialect accents. Then, we adapt standard acoustic models and voice conversion systems by conditioning them on dialect accent representations and finally compare our custom pipelines with a cutting-edge end-to-end architecture from the multi-lingual world. Results show that the adopted models are suitable and have enough capacity to accomplish the task of regional accent conversion. Indeed, we are able to produce speech closely resembling the selected speaker and dialect accent, where the most accurate synthesis is obtained via careful fine-tuning of the multi-lingual model to the multi-dialect case. Finally, we delineate limitations of our multi-stage approach and propose practical mitigations, to be explored in future work.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Falai, Alessio
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Text-To-Speech,TTS,voice conversion,accent conversion,dialect accent
Data di discussione della Tesi
22 Marzo 2022
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^