Return to the Source: Assessing Machine Translation Suitability based on the Source Text using XLM-RoBERTa

Fernicola, Francesco (2022) Return to the Source: Assessing Machine Translation Suitability based on the Source Text using XLM-RoBERTa. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli'
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)

Download (1MB)


In order to assess the suitability of a text for machine translation (MT), the factors in play are many and often vary across language pairs. Readability might certainly account for part of the problem, but the metrics for its evaluation are inherently monolingual (e.g., Gunning fog index) or have language learning as a target. Thus, they solely consider human problems in language learning when approaching a text, such as text length or overly complex syntax. Although these aspects could map to a higher difficulty for an automatic translation process, they only consider the problem in the source text as a comprehension problem, whereas in real-world scenarios most of the attention is on the target text, focusing on the essential cross-language aspects of terminology and pragmatics of the target language. This dissertation represents an attempt at approaching this problem by transferring the knowledge from established MT evaluation metrics to a new model able to predict MT quality from the source text alone. To open the door to experiments in this regard, we explore the fine-tuning of a state-of-the-art transformer model (XLM-RoBERTa), construing the problem both as single-task and multi-task. Results for this methodology are promising, with both model types seemingly able to successfully approximate well-established MT evaluation and quality estimation metrics, achieving low RMSE values in the [0.1-0.2] range.

Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Fernicola, Francesco
Relatore della tesi
Correlatore della tesi
Corso di studio
Ordinamento Cds
Parole chiave
MT,Machine Translation,Machine Learning,Deep Learning,Quality Estimation,MT Evaluation,Computational Linguistics,NLP,Natural Language Processing,Source Text,Target Text,Pragmatics,XLM-R,Transformer,Encoder-Decoder,XLM-RoBERTa,HuggingFace,Sequence Classification,COMET,BERTScore,WMT,hLEPOR,BLEU,TER,fine-tuning,Encoder,Decoder,BERT,Language Model,multi-task
Data di discussione della Tesi
15 Marzo 2022

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento