Pandini, Simone
(2024)
Matrix factorization techniques for Large Language Models.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Matematica [LM-DM270]
Documenti full-text disponibili:
Abstract
In the last years the development of Large Language Models (LLMs) has revolutionized the field of natural language processing (NLP), enabling significant advancements in several contexts, such as text translation, code generation and question answering. To address all these tasks, LLMs have become increasingly complex and resource-intensive, since they require an extensive training on a huge amount of data, mostly in English, that need to be preprocessed. Since the most famous LLMs are meant to be general purpose, a fine-tuning procedure is usually needed to tailor the models on specific domains. However, updating all parameters, would be computationally expensive and would require significant memory resources. This brought to the exploration of parameter-efficient fine-tuning (PEFT) methods that allow to modify only a small subset of parameters while keeping the majority fixed. This approach not only reduces the computational effort but also minimizes the risk of catastrophic forgetting, particularly when working with limited task-specific data. Additionally, compared to training from scratch, fine-tuning may be achieved with fewer labeled instances and less computing resources by utilizing the knowledge already present in these huge models. This method improves the effectiveness of implementing LLMs in practical applications while also democratizing access to cutting-edge AI capabilities. All the most important PEFT methods rely on different types of matrix factorization, such as low-rank, sparse or Singular Value Decomposition in order to decompose the trainable fine-tuning matrix. These factorizations contain fewer parameters than full fine-tuning, allowing to significantly decrease training time and computational resources without affecting the overall model’s performance.
Abstract
In the last years the development of Large Language Models (LLMs) has revolutionized the field of natural language processing (NLP), enabling significant advancements in several contexts, such as text translation, code generation and question answering. To address all these tasks, LLMs have become increasingly complex and resource-intensive, since they require an extensive training on a huge amount of data, mostly in English, that need to be preprocessed. Since the most famous LLMs are meant to be general purpose, a fine-tuning procedure is usually needed to tailor the models on specific domains. However, updating all parameters, would be computationally expensive and would require significant memory resources. This brought to the exploration of parameter-efficient fine-tuning (PEFT) methods that allow to modify only a small subset of parameters while keeping the majority fixed. This approach not only reduces the computational effort but also minimizes the risk of catastrophic forgetting, particularly when working with limited task-specific data. Additionally, compared to training from scratch, fine-tuning may be achieved with fewer labeled instances and less computing resources by utilizing the knowledge already present in these huge models. This method improves the effectiveness of implementing LLMs in practical applications while also democratizing access to cutting-edge AI capabilities. All the most important PEFT methods rely on different types of matrix factorization, such as low-rank, sparse or Singular Value Decomposition in order to decompose the trainable fine-tuning matrix. These factorizations contain fewer parameters than full fine-tuning, allowing to significantly decrease training time and computational resources without affecting the overall model’s performance.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Pandini, Simone
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM ADVANCED MATHEMATICS FOR APPLICATIONS
Ordinamento Cds
DM270
Parole chiave
Large Language Models,Transformers,PEFT,HPC
Data di discussione della Tesi
20 Dicembre 2024
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Pandini, Simone
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM ADVANCED MATHEMATICS FOR APPLICATIONS
Ordinamento Cds
DM270
Parole chiave
Large Language Models,Transformers,PEFT,HPC
Data di discussione della Tesi
20 Dicembre 2024
URI
Statistica sui download
Gestione del documento: