A Tough Row to Hoe: Instruction Fine-Tuning LLaMA 3.2 for Multilingual Sentence Disambiguation and Idiom Identification

Ciminari, Debora (2025) A Tough Row to Hoe: Instruction Fine-Tuning LLaMA 3.2 for Multilingual Sentence Disambiguation and Idiom Identification. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli'
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)

Download (589kB)

Abstract

Idiomatic expressions (IEs) are a fundamental aspect of language, traditionally defined as expressions whose meanings cannot be inferred from their individual components. However, modern linguistic theories propose a more complex definition of idiomaticity, which is now understood as a continuum where IEs can be placed depending on multiple factors. This complexity poses challenges for natural language processing (NLP) applications, where effective handling of IEs can improve performance in various tasks, including sentiment analysis, question answering, text summarisation, and machine translation. This thesis contributes to the study of IEs in NLP by instruction fine-tuning LLaMA 3.2 1B on two tasks: sentence disambiguation and idiom identification. To this end, a multilingual instruction-formatted dataset was created, incorporating English, Italian, and Portuguese as both instruction and input languages. This enabled to investigate the interaction between the instruction and input language and examine the model’s performance when they match and when they differ. The findings showed that aligning instruction and input languages does not always improve performance, highlighting complex cross-linguistic interactions. However, while fine-tuning enhanced idiom identification, it led to slight declines in sentence disambiguation, possibly due to dataset limitations and lack of hyperparameter tuning. Future work could expand language diversity, refine fine-tuning strategies, and explore other LLM architectures for better performance.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Ciminari, Debora
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM TRANSLATION AND TECHNOLOGY
Ordinamento Cds
DM270
Parole chiave
natural language processing, large language models, LLaMA, instruction fine-tuning, idiomatic expressions, multilingual.
Data di discussione della Tesi
18 Marzo 2025
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^