Multi-agent deep reinforcement learning for drone swarms in static and dynamic environments

Laudenzi, Guido (2024) Multi-agent deep reinforcement learning for drone swarms in static and dynamic environments. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)
Download (4MB)

Abstract

The application of robotics, particularly drone swarms, in operational settings presents a frontier in leveraging collective intelligence for complex spatial tasks. While Deep Reinforcement Learning (DRL) has significantly advanced the autonomous control of wheeled robots in static, 2D spaces, the adaptation to flying drones navigating 3D and dynamic environments remains inadequately documented. The state-of-the-art approaches rely on Optimization-Based Motion Planning, which consists of employing pre-programmed constraints shared among agents with limited learning capabilities, falling short in highly dynamic environments and different tasks that were not explicitly pre-programmed. This thesis introduces a novel DRL application for drone swarm path planning in both static and dynamic environments, with an emphasis on obstacle avoidance. The core objective is to showcase the drones' ability to autonomously learn an optimal trajectory in any given environment. Applying a simple unified model across varying scenarios, this study demonstrates the adaptability and generalization capabilities of the proposed DRL algorithm. The methodology includes employing Curriculum Learning to incrementally introduce complexity, incorporated in a Proximal Policy Optimization (PPO) algorithm. The combination of state encoding through convolutional neural networks of a simple observation space provided by an RGB camera with drone's position and the correct reward function enables the drones to learn and exhibit emergent collective behaviors. The findings suggest that the proposed DRL model achieves training convergence and near-optimal trajectories in different maps and presents a scalable solution for the control of drone swarms composed of varying numbers of agents. This research contributes to the growing body of knowledge by providing a viable alternative to supervised learning and classical control theory, challenging the current state-of-the-art in drone navigation.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Laudenzi, Guido

Relatore della tesi

Roli, Andrea

Correlatore della tesi

Kamimura, Akiya

Scuola

Ingegneria e Architettura

Corso di studio