Cupin, Eleonora
 
(2024)
Breaking gender bias in Machine Translation: Expanding the GeNTE corpus and exploring LLMs (inclusive) capabilities.
[Laurea magistrale], Università di Bologna, Corso di Studio in 
Specialized translation [LM-DM270] - Forli', Documento ad accesso riservato.
  
 
  
  
        
        
	
  
  
  
  
  
  
  
    
  
    
      Documenti full-text disponibili:
      
        
          
            | ![[thumbnail of Thesis]](https://amslaurea.unibo.it/style/images/fileicons/application_pdf.png) | Documento PDF (Thesis) Full-text accessibile solo agli utenti istituzionali dell'Ateneo
 Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
 Download (1MB)
              
              
                | Contatta l'autore
 | 
        
      
    
  
  
    
      Abstract
      Gender bias, deeply rooted in our society, not only shapes our communication practices but it is also reflected and reinforced within the language technologies we use. To counter this phenomenon, social movements have pushed for new inclusive approaches, one of which is gender-neutral language. Recognized as a viable solution, this approach has been adopted in administrative settings and within research agendas to mitigate gender bias in Natural Language Processing applications for automatic translation, i.e., Machine Translation systems (MT) and Large Language Models (LLM). To achieve its full integration, the creation of adequate resources that, on one hand, enable research to benchmark the ability of these technologies to handle gender-related issues, and on the other, help to reduce data inequities, is necessary.
	This thesis seeks to contribute to this research line by presenting an English-Spanish test set for gender-neutral translation (GNT), which is derived from GeNTE English-Italian corpus (Piergentili et al., 2023). To open the door to experiments aimed at automating the GNT task, this newly obtained dataset was used to test the inclusive abilities of a Large Language model, specifically Llama 3 (AI@Meta, 2024). Results from a four-level manual evaluation revealed that while the model can generate – to some extent – gender-neutral translations and generalize from a few examples, it still exhibits a bias toward masculine forms both in ambiguous and unambiguous gender scenarios. These findings confirm the issue of gender bias and stress the importance of conducting research that strives for inclusion, ensuring these technologies serve and benefit the entire community.
     
    
      Abstract
      Gender bias, deeply rooted in our society, not only shapes our communication practices but it is also reflected and reinforced within the language technologies we use. To counter this phenomenon, social movements have pushed for new inclusive approaches, one of which is gender-neutral language. Recognized as a viable solution, this approach has been adopted in administrative settings and within research agendas to mitigate gender bias in Natural Language Processing applications for automatic translation, i.e., Machine Translation systems (MT) and Large Language Models (LLM). To achieve its full integration, the creation of adequate resources that, on one hand, enable research to benchmark the ability of these technologies to handle gender-related issues, and on the other, help to reduce data inequities, is necessary.
	This thesis seeks to contribute to this research line by presenting an English-Spanish test set for gender-neutral translation (GNT), which is derived from GeNTE English-Italian corpus (Piergentili et al., 2023). To open the door to experiments aimed at automating the GNT task, this newly obtained dataset was used to test the inclusive abilities of a Large Language model, specifically Llama 3 (AI@Meta, 2024). Results from a four-level manual evaluation revealed that while the model can generate – to some extent – gender-neutral translations and generalize from a few examples, it still exhibits a bias toward masculine forms both in ambiguous and unambiguous gender scenarios. These findings confirm the issue of gender bias and stress the importance of conducting research that strives for inclusion, ensuring these technologies serve and benefit the entire community.
     
  
  
    
    
      Tipologia del documento
      Tesi di laurea
(Laurea magistrale)
      
      
      
      
        
      
        
          Autore della tesi
          Cupin, Eleonora
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
          Indirizzo
          CURRICULUM TRANSLATION AND TECHNOLOGY
          
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          gender bias,gender inclusive language,large language models,machine translation,gender neutral translation
          
        
      
        
          Data di discussione della Tesi
          17 Dicembre 2024
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di laurea
(NON SPECIFICATO)
      
      
      
      
        
      
        
          Autore della tesi
          Cupin, Eleonora
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
          Indirizzo
          CURRICULUM TRANSLATION AND TECHNOLOGY
          
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          gender bias,gender inclusive language,large language models,machine translation,gender neutral translation
          
        
      
        
          Data di discussione della Tesi
          17 Dicembre 2024
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: 
      
        