Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
Reinforcement learning has demonstrated remarkable capability in com-
plex control tasks, yet the trial-and-error nature of policy optimization offers
no guarantee that the learned controller will respect safety constraints, either
during training or at deployment. This thesis presents CALOS (Control-
Affine System Lyapunov On-manifold Safety), a runtime safety layer that
enforces attitude constraints on a quadrotor without modifying the under-
lying learning algorithm.
CALOS formulates four tilt-angle inequalities and a Lyapunov descent
condition as a single quadratic programme whose solution is the minimum-
norm correction to the nominal torque output of the policy. The QP is
solved exactly via active-set enumeration over the three-dimensional torque
space, at a per-step cost compatible with massively parallel simulation. The
safety layer is composable with any policy gradient method; Proximal Policy
Optimization is used throughout this work.
Six agents are evaluated on three reference trajectories of increasing diffi-
culty: an unconstrained PPO baseline, an ATACOM-like projection, a Lya-
punov scaling operator, their cascaded composition, and the two CALOS
variants (exact QP and projected approximation). CALOS-QP achieves
zero attitude-constraint violations on the training trajectory and substan-
tially reduces violation count and duration on harder, out-of-distribution
conditions, while the Lyapunov and Cascade agents collapse under spawn-
offset conditions due to a torque-clamping failure mode that CALOS avoids
by design. A secondary finding is that projecting every training action
onto the safe manifold induces an implicit curriculum: the policy backbone
trained with the safety layer active outperforms the unconstrained baseline
by 55–74% on lateral tracking error even when the layer is subsequently
disabled, demonstrating that constraint projection accelerates convergence
by restricting exploration to the dynamically recoverable region of the state space.
Abstract
Reinforcement learning has demonstrated remarkable capability in com-
plex control tasks, yet the trial-and-error nature of policy optimization offers
no guarantee that the learned controller will respect safety constraints, either
during training or at deployment. This thesis presents CALOS (Control-
Affine System Lyapunov On-manifold Safety), a runtime safety layer that
enforces attitude constraints on a quadrotor without modifying the under-
lying learning algorithm.
CALOS formulates four tilt-angle inequalities and a Lyapunov descent
condition as a single quadratic programme whose solution is the minimum-
norm correction to the nominal torque output of the policy. The QP is
solved exactly via active-set enumeration over the three-dimensional torque
space, at a per-step cost compatible with massively parallel simulation. The
safety layer is composable with any policy gradient method; Proximal Policy
Optimization is used throughout this work.
Six agents are evaluated on three reference trajectories of increasing diffi-
culty: an unconstrained PPO baseline, an ATACOM-like projection, a Lya-
punov scaling operator, their cascaded composition, and the two CALOS
variants (exact QP and projected approximation). CALOS-QP achieves
zero attitude-constraint violations on the training trajectory and substan-
tially reduces violation count and duration on harder, out-of-distribution
conditions, while the Lyapunov and Cascade agents collapse under spawn-
offset conditions due to a torque-clamping failure mode that CALOS avoids
by design. A secondary finding is that projecting every training action
onto the safe manifold induces an implicit curriculum: the policy backbone
trained with the safety layer active outperforms the unconstrained baseline
by 55–74% on lateral tracking error even when the layer is subsequently
disabled, demonstrating that constraint projection accelerates convergence
by restricting exploration to the dynamically recoverable region of the state space.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Cesareo, Fabrizio
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
AUTOMATION ENGINEERING
Ordinamento Cds
DM270
Parole chiave
Reinforcement Learning, Safety, Safety Layer, Lyapunov, Safe RL, Safe Reinforcement Learning, Online safety layer
Data di discussione della Tesi
25 Marzo 2026
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Cesareo, Fabrizio
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
AUTOMATION ENGINEERING
Ordinamento Cds
DM270
Parole chiave
Reinforcement Learning, Safety, Safety Layer, Lyapunov, Safe RL, Safe Reinforcement Learning, Online safety layer
Data di discussione della Tesi
25 Marzo 2026
URI
Gestione del documento: