Evaluating Domain Adaptation in Neural Machine Translation and Large Language Models: Insights from the TICO-19 Benchmark

Galiero, Lucia (2025) Evaluating Domain Adaptation in Neural Machine Translation and Large Language Models: Insights from the TICO-19 Benchmark. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli'
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)

Download (1MB)

Abstract

This thesis examines the impact of domain adaptation on the performance of an adaptive Neural Machine Translation (NMT) system (ModernMT) and a Large Language Model (LLM - LLaMa 3.2 90B) in the English-Italian language pair. The analysis is based on an experiment using data from the TICO-19 benchmark, a multilingual dataset developed to support translation efforts during the COVID-19 pandemic. Since no Italian version of the benchmark is currently available, a preliminary phase involved the manual translation and alignment of two selected academic-scientific articles to create a high-quality reference set. The research is framed within the broader context of Crisis Translation, a growing field in Translation Studies that investigates the role of linguistic mediation in emergency scenarios. Particular attention is given to Crisis Machine Translation (CMT), an emerging subfield exploring how MT systems can be optimized for use in crisis contexts, where the speed and accuracy of multilingual communication are paramount. The evaluation includes both automatic metrics (BLEU, chrF3, COMET) and a human assessment phase to determine whether domain-adapted LLMs can achieve comparable or superior results to an adaptive NMT system. The findings reveal that, despite the increasing capabilities of LLMs, the domain-adapted NMT system consistently outperforms its counterpart, challenging the assumption that LLMs inherently excel in specialized translation tasks. The structure of the thesis is divided into four main sections: an introduction to Crisis Translation and its applications, a discussion of the technologies employed, an overview of the experimental methodology, and a final analysis of the results. This research contributes to the growing discussion on the feasibility of integrating LLMs into domain-adaptive machine translation and provides insights into the practical implications of deploying such technologies in crisis scenarios.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Galiero, Lucia
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM TRANSLATION AND TECHNOLOGY
Ordinamento Cds
DM270
Parole chiave
Neural Machine Translation,Large Language Models,Crisis Translation,TICO-19,Domain Adaptation,COVID-19,PubMed,ModernMT,LLaMa,Machine Translation
Data di discussione della Tesi
18 Marzo 2025
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^