Marongiu, Gian Mario
(2025)
Personality Alignment in Large Language Models: Analysis and Fine-Tuning through Preference Optimization.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
![[thumbnail of Thesis]](https://amslaurea.unibo.it/style/images/fileicons/application_pdf.png) |
Documento PDF (Thesis)
Full-text non accessibile fino al 22 Luglio 2026.
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (956kB)
| Contatta l'autore
|
Abstract
As Large Language Models (LLMs) become increasingly embedded in interactive applications, the ability to personalize their behavior—particularly through personality traits—has emerged as a key goal. Personality shaping can enhance user engagement, trust, and effectiveness, but current methods often rely on large-scale models and manually labeled data, limiting scalability and accessibility.
This work presents a fully automated pipeline for aligning small-scale Large Language Models with the Big Five personality traits without relying on human-labeled data. The proposed method leverages LLMs themselves at every stage of the process: generating personality-variant responses, evaluating those responses through preference ratings, training a reward model, and fine-tuning the base model via Supervised Fine-Tuning and Direct Preference Optimization. Special attention is given to addressing challenges specific to smaller models, including prompt-level positional biases—mitigated through systematic inversion of input order—and limited representational capacity, addressed by decomposing complex tasks into simpler sub-tasks across the pipeline. To reduce bias in the evaluation phase, rating tasks are reframed as binary preference comparisons rather than Likert-scale scoring. While the effectiveness of the alignment remains difficult to conclusively measure, the fine-tuned models exhibit consistent patterns in self-assessed personality expression over time. The findings suggest that personality shaping is possible even in low-resource, fully autonomous settings, though the work also highlights the need for improved evaluation methods—such as testing the aligned LLM on specific benchmarks or conducting human-centered assessments—to better capture the nuances of personality in LLM behavior.
Abstract
As Large Language Models (LLMs) become increasingly embedded in interactive applications, the ability to personalize their behavior—particularly through personality traits—has emerged as a key goal. Personality shaping can enhance user engagement, trust, and effectiveness, but current methods often rely on large-scale models and manually labeled data, limiting scalability and accessibility.
This work presents a fully automated pipeline for aligning small-scale Large Language Models with the Big Five personality traits without relying on human-labeled data. The proposed method leverages LLMs themselves at every stage of the process: generating personality-variant responses, evaluating those responses through preference ratings, training a reward model, and fine-tuning the base model via Supervised Fine-Tuning and Direct Preference Optimization. Special attention is given to addressing challenges specific to smaller models, including prompt-level positional biases—mitigated through systematic inversion of input order—and limited representational capacity, addressed by decomposing complex tasks into simpler sub-tasks across the pipeline. To reduce bias in the evaluation phase, rating tasks are reframed as binary preference comparisons rather than Likert-scale scoring. While the effectiveness of the alignment remains difficult to conclusively measure, the fine-tuned models exhibit consistent patterns in self-assessed personality expression over time. The findings suggest that personality shaping is possible even in low-resource, fully autonomous settings, though the work also highlights the need for improved evaluation methods—such as testing the aligned LLM on specific benchmarks or conducting human-centered assessments—to better capture the nuances of personality in LLM behavior.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Marongiu, Gian Mario
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Models, personality alignment, reward modeling, Big Five personality traits, Direct Preference Optimization
Data di discussione della Tesi
22 Luglio 2025
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Marongiu, Gian Mario
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Models, personality alignment, reward modeling, Big Five personality traits, Direct Preference Optimization
Data di discussione della Tesi
22 Luglio 2025
URI
Gestione del documento: