Breaking gender bias in Machine Translation: Expanding the GeNTE corpus and exploring LLMs (inclusive) capabilities

Cupin, Eleonora (2024) Breaking gender bias in Machine Translation: Expanding the GeNTE corpus and exploring LLMs (inclusive) capabilities. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli', Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (1MB) | Contatta l'autore

Abstract

Gender bias, deeply rooted in our society, not only shapes our communication practices but it is also reflected and reinforced within the language technologies we use. To counter this phenomenon, social movements have pushed for new inclusive approaches, one of which is gender-neutral language. Recognized as a viable solution, this approach has been adopted in administrative settings and within research agendas to mitigate gender bias in Natural Language Processing applications for automatic translation, i.e., Machine Translation systems (MT) and Large Language Models (LLM). To achieve its full integration, the creation of adequate resources that, on one hand, enable research to benchmark the ability of these technologies to handle gender-related issues, and on the other, help to reduce data inequities, is necessary. This thesis seeks to contribute to this research line by presenting an English-Spanish test set for gender-neutral translation (GNT), which is derived from GeNTE English-Italian corpus (Piergentili et al., 2023). To open the door to experiments aimed at automating the GNT task, this newly obtained dataset was used to test the inclusive abilities of a Large Language model, specifically Llama 3 (AI@Meta, 2024). Results from a four-level manual evaluation revealed that while the model can generate – to some extent – gender-neutral translations and generalize from a few examples, it still exhibits a bias toward masculine forms both in ambiguous and unambiguous gender scenarios. These findings confirm the issue of gender bias and stress the importance of conducting research that strives for inclusion, ensuring these technologies serve and benefit the entire community.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Cupin, Eleonora
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM TRANSLATION AND TECHNOLOGY
Ordinamento Cds
DM270
Parole chiave
gender bias,gender inclusive language,large language models,machine translation,gender neutral translation
Data di discussione della Tesi
17 Dicembre 2024
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^