Performance Analysis of the Design of Experience Replay Buffers based on Intrinsic Motivation in Deep Reinforcement Learning

Vaccari, Giulio (2023) Performance Analysis of the Design of Experience Replay Buffers based on Intrinsic Motivation in Deep Reinforcement Learning. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)
Download (1MB)

Abstract

In Reinforcement Learning, an intelligent system, usually referred to as an agent, must learn to maximize a cumulative reward by correctly behaving in an initially unknown environment. In order to improve, the agent must collect feedback from its interactions with the surrounding world, which guides the agent in adapting its actions to achieve better scores. However, there are some environments where feedback is not constantly provided, thus making learning more difficult. In these circumstances, we say that the reward is sparse, and including additional modules in the learning framework can be necessary to improve agent performance. Methods based on intrinsic motivation try to address the problem of sparse feedback by introducing an additional reward that incentives the agent when its behavior leads it to explore interesting regions of the environment. For example, this reward could be proportional to the novelty of the states visited by the agent during its exploration. In this way, the agent learns to better explore the problem state space, without being blocked by the absence of feedback. This thesis aims to implement and analyze a new framework for dealing with sparse reward environments. To this end, three different models based on as many intrinsic motivation techniques are implemented. Each model makes use of a prioritized experience replay buffer in which transitions priorities are given by intrinsic motivation scores. Analysis of the results shows that prioritization based on temporal difference errors remains the best performing approach, but it also revealed an interesting potential in certain categories of intrinsic motivation techniques, capable of achieving higher scores than those obtained from a uniform priority model.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Vaccari, Giulio

Relatore della tesi

Musolesi, Mirco

Correlatore della tesi

Franceschelli, Giorgio

Scuola

Ingegneria e Architettura

Corso di studio