Bridging the Resource Gap: Fine-tuning a Machine Translation System for English-Swahili Medical Translation in Kenya's Low-Resource Context

Di Bonaventura, Jennifer (2026) Bridging the Resource Gap: Fine-tuning a Machine Translation System for English-Swahili Medical Translation in Kenya's Low-Resource Context. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli', Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (2MB) | Contatta l'autore

Abstract

In Kenya, the predominance of English in healthcare contexts creates a significant barrier for Swahili speakers, compromising access to information for most Kenyan citizens. This thesis aims to research ways to address this gap by fine-tuning a specialized English-Swahili machine translation system for the medical domain. The study involved the creation of the AFYA corpus, an original parallel dataset of approximately 15 000 segments, used to fine-tune an NLLB machine translation model. The performance was evaluated against Google Translate through a hybrid framework that combined automated metrics (COMET, SacreBLEU and chrF2) and a human evaluation task based on MQM metrics. Results revealed that while Google Translate offers higher Fluency, NLLB achieves superior semantic alignment and statistical parity in clinical safety (Verity). Crucially, human analysis revealed that the performance of Google Translate was inconsistent and resulted in a negative Inter-Annotator Agreement, often masking terminological errors. In contrast, NLLB’s errors were more obvious and transparent, thus easier to spot and correct through post-editing. This study argues for the need to preserve linguistic diversity in specialized fields, providing a framework for more accessible healthcare-related information in Kenya.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Di Bonaventura, Jennifer

Relatore della tesi

Bernardini, Silvia

Correlatore della tesi

Barron Cedeno, Luis Alberto ; Gajo, Paolo ; Gitonga, Josephat John

Scuola

Lingue e Letterature, Traduzione e Interpretazione

Corso di studio