Personality Alignment in Large Language Models: Analysis and Fine-Tuning through Preference Optimization

Marongiu, Gian Mario (2025) Personality Alignment in Large Language Models: Analysis and Fine-Tuning through Preference Optimization. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text non accessibile fino al 22 Luglio 2026.
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (956kB) | Contatta l'autore

Abstract

As Large Language Models (LLMs) become increasingly embedded in interactive applications, the ability to personalize their behavior—particularly through personality traits—has emerged as a key goal. Personality shaping can enhance user engagement, trust, and effectiveness, but current methods often rely on large-scale models and manually labeled data, limiting scalability and accessibility. This work presents a fully automated pipeline for aligning small-scale Large Language Models with the Big Five personality traits without relying on human-labeled data. The proposed method leverages LLMs themselves at every stage of the process: generating personality-variant responses, evaluating those responses through preference ratings, training a reward model, and fine-tuning the base model via Supervised Fine-Tuning and Direct Preference Optimization. Special attention is given to addressing challenges specific to smaller models, including prompt-level positional biases—mitigated through systematic inversion of input order—and limited representational capacity, addressed by decomposing complex tasks into simpler sub-tasks across the pipeline. To reduce bias in the evaluation phase, rating tasks are reframed as binary preference comparisons rather than Likert-scale scoring. While the effectiveness of the alignment remains difficult to conclusively measure, the fine-tuned models exhibit consistent patterns in self-assessed personality expression over time. The findings suggest that personality shaping is possible even in low-resource, fully autonomous settings, though the work also highlights the need for improved evaluation methods—such as testing the aligned LLM on specific benchmarks or conducting human-centered assessments—to better capture the nuances of personality in LLM behavior.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Marongiu, Gian Mario

Relatore della tesi

Musolesi, Mirco

Correlatore della tesi

Tennant, Elizaveta ; Franceschelli, Giorgio ; Wolf, Lorenz

Scuola

Ingegneria e Architettura

Corso di studio