Reinforcement Learning for non-deterministic problems - exploration of Chance Nodes

Cecchi, Lorenzo (2025) Reinforcement Learning for non-deterministic problems - exploration of Chance Nodes. [Laurea magistrale], Università di Bologna, Corso di Studio in Matematica [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)
Download (765kB)

Abstract

The quest to teach computers how to make decisions in complex and chaotic environments has been a central point of research since their invention. Among the frameworks that have been proposed regarding this topic, Reinforcement Learning stands out. Reinforcement Learning algorithms are often characterized by the need to balance exploration and exploitation. A simplified but fundamental instance of this dilemma is the multi-armed bandit problem. Over the years, a rich theory has been developed around this problem, including performance bounds and optimal strategies under various assumptions. Classical algorithms such as the ε-greedy strategy and Upper Confidence Bounds have been widely studied and applied. However, in many realistic settings, some form of prior knowledge or initial preference is available regarding possible actions. This leads to extensions of the classical bandit model that incorporate Bayesian priors, like the Thompson sampling algorithm, or externally provided preferences, such as those employed in PUCT: a key component of the Monte Carlo Tree Search algorithm used in AlphaZero. The aim of this thesis is to investigate the relations between bandit strategies and tree search methods for decision-making, especially in the presence of stochasticity.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Cecchi, Lorenzo

Relatore della tesi

Paolini, Giovanni

Scuola

Scienze

Corso di studio

Matematica [LM-DM270]

Indirizzo