Intrinsic Motivation for Creative Decision-Making in Reinforcement Learning

Grotto, Giovanni (2026) Intrinsic Motivation for Creative Decision-Making in Reinforcement Learning. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text non accessibile fino al 30 Giugno 2027.
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (1MB) | Contatta l'autore

Abstract

This thesis introduces the Intrinsic Surprise Module (ISM), a reinforcement learning (RL) framework designed to encourage creative decision-making by reducing behavioral convergence and promoting diverse strategies. The ISM operates as an internal observer that learns to predict an agent’s actions based on its past behavior in similar states. 'Surprise' is quantified as the discrepancy between predicted and actual actions, and this prediction error is used to generate an auxiliary intrinsic reward. By incentivizing deviations from routine behavior while preserving the primary task objective, the module encourages exploration of unconventional yet effective strategies. The approach is algorithm agnostic and integrates with modern policy-gradient methods such as PPO and GRPO with minimal computational overhead. The thesis demonstrates that surprise can be formally incorporated into RL as a prediction-error signal without modifying the underlying optimization process. This mechanism mitigates the tendency of agents to converge to a single solution, enabling the emergence of multiple high-quality behavioral trajectories. The work further identifies environmental characteristics, such as structured stochasticity, solution diversity, and emergent complexity, where creativity-oriented policies provide the greatest benefit. Empirical evaluations across multiple domains, including Minigrid navigation, MiniHack game environments, multi-agent settings, and generative tasks such as creative chess puzzle generation, show that ISM-driven agents maintain competitive task performance while exhibiting significantly greater policy entropy and behavioral diversity compared to baseline intrinsic motivation methods. These results suggest that surprise-driven intrinsic objectives offer a principled pathway for guiding reinforcement learning toward complex and creative behaviors beyond conventional reward maximization.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Grotto, Giovanni
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Reinforcement Learning, Artificial Intelligence, Creativity in AI
Data di discussione della Tesi
26 Marzo 2026
URI

Altri metadati

Gestione del documento: Visualizza il documento

^