Exploring Large Language Models as Reinforcement Learning Agents with Explicit Autonomous Reasoning

Bortolato, Samuele (2024) Exploring Large Language Models as Reinforcement Learning Agents with Explicit Autonomous Reasoning. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Recent advancements in generative AI, particularly Large Language Models (LLMs), have been impressive. These models are able to generalize across unseen tasks and show emergent capabilities, also in zero or few-shot learning scenarios. Current research focuses on integrating LLMs into agents to serve as a foundation for learning and reasoning. Leveraging their acquired knowledge, these agents can navigate the web, simulate human behavior in multi-agent environments, and reason about their actions. Despite their successes, this class of agents faces limitations in executing non-textual actions. Currently, actions are either learned through human rewarders or acquired via behavioral cloning. Reinforcement Learning (RL) have been primarily used as feature extractors, often teaching the agent to follow external commands through techniques like hindsight experience replay or text-action alignment. The few works employing LLMs to sample the actions for the environment constrain the generation to a fixed set of commands, not taking full advantage of the generative capabilities of the models. This project explores a novel approach that allows the model to explicitly reason about the task before generating an action. We conduct proof-of-concept experiments where the agent freely generates reasoning through chain-of-thought prompting before producing a valid action in the environment using constrained generation. Additionally, we investigate the feasibility of employing multiple models, exploring the potential of guiding a smaller language model with a frozen Large Language Model and using natural language as shared interface. We present and compare various implementation strategies, discussing our findings and proposing a future agenda for the development of more autonomous LLM-based agents.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Bortolato, Samuele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Reinforcement Learning,Large Language Models,Chain of Thought Prompting,Constrained Generation
Data di discussione della Tesi
19 Marzo 2024
URI

Altri metadati

Gestione del documento: Visualizza il documento

^