Gheriglio, Giovanni
(2025)
Investigating Contextual Memorization in Large Language Models through Prompting Techniques.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Ingegneria informatica [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
Large Language Models are central to advancements in Natural Language Processing and human-computer interaction, yet their increasing capabilities raise ethical concerns, such as memorization of sensitive or copyrighted information. This thesis introduces the concept of contextual memorization, exploring how models store context-specific data. Using poems by renowned authors as a case study, the extent of memorization was analyzed based on elements like author and title. The study employs techniques such as Prompt Tuning, applied to both Supervised Fine-Tuning and Reinforcement Learning, to create soft prompts that enhance the retrieval of memorized content. Results demonstrate that soft prompts effectively increase reproduction rates, highlighting implications for model training and ethical considerations.
Abstract
Large Language Models are central to advancements in Natural Language Processing and human-computer interaction, yet their increasing capabilities raise ethical concerns, such as memorization of sensitive or copyrighted information. This thesis introduces the concept of contextual memorization, exploring how models store context-specific data. Using poems by renowned authors as a case study, the extent of memorization was analyzed based on elements like author and title. The study employs techniques such as Prompt Tuning, applied to both Supervised Fine-Tuning and Reinforcement Learning, to create soft prompts that enhance the retrieval of memorized content. Results demonstrate that soft prompts effectively increase reproduction rates, highlighting implications for model training and ethical considerations.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Gheriglio, Giovanni
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INGEGNERIA INFORMATICA
Ordinamento Cds
DM270
Parole chiave
Large Language Models, memorization, Reinforcement Learning, Prompt Tuning, Supervised Fine-Tuning, Multi-Armed Bandit
Data di discussione della Tesi
25 Marzo 2025
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Gheriglio, Giovanni
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INGEGNERIA INFORMATICA
Ordinamento Cds
DM270
Parole chiave
Large Language Models, memorization, Reinforcement Learning, Prompt Tuning, Supervised Fine-Tuning, Multi-Armed Bandit
Data di discussione della Tesi
25 Marzo 2025
URI
Gestione del documento: