Intrinsic Motivation for Creative Decision-Making in Reinforcement Learning

Grotto, Giovanni (2026) Intrinsic Motivation for Creative Decision-Making in Reinforcement Learning. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text non accessibile fino al 30 Giugno 2027.
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (1MB) | Contatta l'autore

Abstract

This thesis introduces the Intrinsic Surprise Module (ISM), a reinforcement learning (RL) framework designed to encourage creative decision-making by reducing behavioral convergence and promoting diverse strategies. The ISM operates as an internal observer that learns to predict an agent’s actions based on its past behavior in similar states. 'Surprise' is quantified as the discrepancy between predicted and actual actions, and this prediction error is used to generate an auxiliary intrinsic reward. By incentivizing deviations from routine behavior while preserving the primary task objective, the module encourages exploration of unconventional yet effective strategies. The approach is algorithm agnostic and integrates with modern policy-gradient methods such as PPO and GRPO with minimal computational overhead. The thesis demonstrates that surprise can be formally incorporated into RL as a prediction-error signal without modifying the underlying optimization process. This mechanism mitigates the tendency of agents to converge to a single solution, enabling the emergence of multiple high-quality behavioral trajectories. The work further identifies environmental characteristics, such as structured stochasticity, solution diversity, and emergent complexity, where creativity-oriented policies provide the greatest benefit. Empirical evaluations across multiple domains, including Minigrid navigation, MiniHack game environments, multi-agent settings, and generative tasks such as creative chess puzzle generation, show that ISM-driven agents maintain competitive task performance while exhibiting significantly greater policy entropy and behavioral diversity compared to baseline intrinsic motivation methods. These results suggest that surprise-driven intrinsic objectives offer a principled pathway for guiding reinforcement learning toward complex and creative behaviors beyond conventional reward maximization.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Grotto, Giovanni

Relatore della tesi

Musolesi, Mirco

Correlatore della tesi

Franceschelli, Giorgio

Scuola

Ingegneria e Architettura

Corso di studio