Breaking gender bias in Machine Translation: Expanding the GeNTE corpus and exploring LLMs (inclusive) capabilities

Cupin, Eleonora (2024) Breaking gender bias in Machine Translation: Expanding the GeNTE corpus and exploring LLMs (inclusive) capabilities. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli', Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (1MB) | Contatta l'autore

Abstract

Gender bias, deeply rooted in our society, not only shapes our communication practices but it is also reflected and reinforced within the language technologies we use. To counter this phenomenon, social movements have pushed for new inclusive approaches, one of which is gender-neutral language. Recognized as a viable solution, this approach has been adopted in administrative settings and within research agendas to mitigate gender bias in Natural Language Processing applications for automatic translation, i.e., Machine Translation systems (MT) and Large Language Models (LLM). To achieve its full integration, the creation of adequate resources that, on one hand, enable research to benchmark the ability of these technologies to handle gender-related issues, and on the other, help to reduce data inequities, is necessary. This thesis seeks to contribute to this research line by presenting an English-Spanish test set for gender-neutral translation (GNT), which is derived from GeNTE English-Italian corpus (Piergentili et al., 2023). To open the door to experiments aimed at automating the GNT task, this newly obtained dataset was used to test the inclusive abilities of a Large Language model, specifically Llama 3 (AI@Meta, 2024). Results from a four-level manual evaluation revealed that while the model can generate – to some extent – gender-neutral translations and generalize from a few examples, it still exhibits a bias toward masculine forms both in ambiguous and unambiguous gender scenarios. These findings confirm the issue of gender bias and stress the importance of conducting research that strives for inclusion, ensuring these technologies serve and benefit the entire community.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Cupin, Eleonora

Relatore della tesi

Garcea, Federico

Correlatore della tesi

Savoldi, Beatrice ; Ferraresi, Adriano

Scuola

Lingue e Letterature, Traduzione e Interpretazione

Corso di studio