Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0) Download (1MB) |
Abstract
In order to assess the suitability of a text for machine translation (MT), the factors in play are many and often vary across language pairs. Readability might certainly account for part of the problem, but the metrics for its evaluation are inherently monolingual (e.g., Gunning fog index) or have language learning as a target. Thus, they solely consider human problems in language learning when approaching a text, such as text length or overly complex syntax. Although these aspects could map to a higher difficulty for an automatic translation process, they only consider the problem in the source text as a comprehension problem, whereas in real-world scenarios most of the attention is on the target text, focusing on the essential cross-language aspects of terminology and pragmatics of the target language. This dissertation represents an attempt at approaching this problem by transferring the knowledge from established MT evaluation metrics to a new model able to predict MT quality from the source text alone. To open the door to experiments in this regard, we explore the fine-tuning of a state-of-the-art transformer model (XLM-RoBERTa), construing the problem both as single-task and multi-task. Results for this methodology are promising, with both model types seemingly able to successfully approximate well-established MT evaluation and quality estimation metrics, achieving low RMSE values in the [0.1-0.2] range.