A Tough Row to Hoe: Instruction Fine-Tuning LLaMA 3.2 for Multilingual Sentence Disambiguation and Idiom Identification

Ciminari, Debora (2025) A Tough Row to Hoe: Instruction Fine-Tuning LLaMA 3.2 for Multilingual Sentence Disambiguation and Idiom Identification. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli'

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)
Download (589kB)

Abstract

Idiomatic expressions (IEs) are a fundamental aspect of language, traditionally defined as expressions whose meanings cannot be inferred from their individual components. However, modern linguistic theories propose a more complex definition of idiomaticity, which is now understood as a continuum where IEs can be placed depending on multiple factors. This complexity poses challenges for natural language processing (NLP) applications, where effective handling of IEs can improve performance in various tasks, including sentiment analysis, question answering, text summarisation, and machine translation. This thesis contributes to the study of IEs in NLP by instruction fine-tuning LLaMA 3.2 1B on two tasks: sentence disambiguation and idiom identification. To this end, a multilingual instruction-formatted dataset was created, incorporating English, Italian, and Portuguese as both instruction and input languages. This enabled to investigate the interaction between the instruction and input language and examine the model’s performance when they match and when they differ. The findings showed that aligning instruction and input languages does not always improve performance, highlighting complex cross-linguistic interactions. However, while fine-tuning enhanced idiom identification, it led to slight declines in sentence disambiguation, possibly due to dataset limitations and lack of hyperparameter tuning. Future work could expand language diversity, refine fine-tuning strategies, and explore other LLM architectures for better performance.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Ciminari, Debora

Relatore della tesi

Barron Cedeno, Luis Alberto

Correlatore della tesi

Milicevic Petrovic, Maja

Scuola

Lingue e Letterature, Traduzione e Interpretazione

Corso di studio