Documenti full-text disponibili:
Abstract
Recent advancements in deep reinforcement learning opened new strategies for developing intelligent and autonomous robots, such as unmanned aerial vehicles (UAVs). Drones are among the most agile UAVs and can solve highly dynamic tasks with flight policies learned with deep reinforcement learning techniques. Specifically, a neural network can be trained to directly map sensory data to control actions. This end-to-end approach is able to produce outputs with less latency and information loss compared to those approaches with separation between perception, planning, and control. For this reason, represents a valid solution for agile flight.
In addition, new parallel software simulators are capable of providing tons of high-quality data in a small amount of time that can be used to rapidly learn control policies robust to domain transfers.
In this thesis, a drone is trained with the deep reinforcement learning algorithm PPO, using three different reward functions. The obtained control policy emerges from the imposed constraints and the feedback provided. The outcomes are compared over randomized scenarios. The best policy achieves success rates of 97.52% and 77.30% in the randomized training and test scenarios respectively. The results demonstrate that a learning-based control stack, without any planning and sensing stages, can maneuver the quadrotor and solve the agile task, even in previously unseen scenarios. The results pave the way for future development to implement this solution into real-world applications.
Abstract
Recent advancements in deep reinforcement learning opened new strategies for developing intelligent and autonomous robots, such as unmanned aerial vehicles (UAVs). Drones are among the most agile UAVs and can solve highly dynamic tasks with flight policies learned with deep reinforcement learning techniques. Specifically, a neural network can be trained to directly map sensory data to control actions. This end-to-end approach is able to produce outputs with less latency and information loss compared to those approaches with separation between perception, planning, and control. For this reason, represents a valid solution for agile flight.
In addition, new parallel software simulators are capable of providing tons of high-quality data in a small amount of time that can be used to rapidly learn control policies robust to domain transfers.
In this thesis, a drone is trained with the deep reinforcement learning algorithm PPO, using three different reward functions. The obtained control policy emerges from the imposed constraints and the feedback provided. The outcomes are compared over randomized scenarios. The best policy achieves success rates of 97.52% and 77.30% in the randomized training and test scenarios respectively. The results demonstrate that a learning-based control stack, without any planning and sensing stages, can maneuver the quadrotor and solve the agile task, even in previously unseen scenarios. The results pave the way for future development to implement this solution into real-world applications.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Mengozzi, Sebastiano
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
quadrotor,drone,reinforcement learning,artificial intelligence,robotics
Data di discussione della Tesi
22 Marzo 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Mengozzi, Sebastiano
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
quadrotor,drone,reinforcement learning,artificial intelligence,robotics
Data di discussione della Tesi
22 Marzo 2023
URI
Statistica sui download
Gestione del documento: