Vaccari, Giulio
 
(2023)
Performance Analysis of the Design of Experience Replay Buffers based on Intrinsic Motivation in Deep Reinforcement Learning.
[Laurea magistrale], Università di Bologna, Corso di Studio in 
Artificial intelligence [LM-DM270]
   
  
  
        
        
	
  
  
  
  
  
  
  
    
  
    
      Documenti full-text disponibili:
      
    
  
  
    
      Abstract
      In Reinforcement Learning, an intelligent system, usually referred to as an agent, must learn to maximize a cumulative reward by correctly behaving in an initially unknown environment. In order to improve, the agent must collect feedback from its interactions with the surrounding world, which guides the agent in adapting its actions to achieve better scores. However, there are some environments where feedback is not constantly provided, thus making learning more difficult. In these circumstances, we say that the reward is sparse, and including additional modules in the learning framework can be necessary to improve agent performance. Methods based on intrinsic motivation try to address the problem of sparse feedback by introducing an additional reward that incentives the agent when its behavior leads it to explore interesting regions of the environment. For example, this reward could be proportional to the novelty of the states visited by the agent during its exploration. In this way, the agent learns to better explore the problem state space, without being blocked by the absence of feedback. This thesis aims to implement and analyze a new framework for dealing with sparse reward environments.
To this end, three different models based on as many intrinsic motivation techniques are implemented. Each model makes use of a prioritized experience replay buffer in which transitions priorities are given by intrinsic motivation scores. Analysis of the results shows that prioritization based on temporal difference errors remains the best performing approach, but it also revealed an interesting potential in certain categories of intrinsic motivation techniques, capable of achieving higher scores than those obtained from a uniform priority model.
     
    
      Abstract
      In Reinforcement Learning, an intelligent system, usually referred to as an agent, must learn to maximize a cumulative reward by correctly behaving in an initially unknown environment. In order to improve, the agent must collect feedback from its interactions with the surrounding world, which guides the agent in adapting its actions to achieve better scores. However, there are some environments where feedback is not constantly provided, thus making learning more difficult. In these circumstances, we say that the reward is sparse, and including additional modules in the learning framework can be necessary to improve agent performance. Methods based on intrinsic motivation try to address the problem of sparse feedback by introducing an additional reward that incentives the agent when its behavior leads it to explore interesting regions of the environment. For example, this reward could be proportional to the novelty of the states visited by the agent during its exploration. In this way, the agent learns to better explore the problem state space, without being blocked by the absence of feedback. This thesis aims to implement and analyze a new framework for dealing with sparse reward environments.
To this end, three different models based on as many intrinsic motivation techniques are implemented. Each model makes use of a prioritized experience replay buffer in which transitions priorities are given by intrinsic motivation scores. Analysis of the results shows that prioritization based on temporal difference errors remains the best performing approach, but it also revealed an interesting potential in certain categories of intrinsic motivation techniques, capable of achieving higher scores than those obtained from a uniform priority model.
     
  
  
    
    
      Tipologia del documento
      Tesi di laurea
(Laurea magistrale)
      
      
      
      
        
      
        
          Autore della tesi
          Vaccari, Giulio
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          Artificial Intelligence,Deep Reinforcement Learning,Intrinsic Motivation,Experience Replay Buffer,Experience Prioritization,Sparse Reward
          
        
      
        
          Data di discussione della Tesi
          23 Marzo 2023
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di laurea
(NON SPECIFICATO)
      
      
      
      
        
      
        
          Autore della tesi
          Vaccari, Giulio
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          Artificial Intelligence,Deep Reinforcement Learning,Intrinsic Motivation,Experience Replay Buffer,Experience Prioritization,Sparse Reward
          
        
      
        
          Data di discussione della Tesi
          23 Marzo 2023
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: