Lyapunov on-manifold safety layer for control-affine systems in reinforcement learning

Cesareo, Fabrizio (2026) Lyapunov on-manifold safety layer for control-affine systems in reinforcement learning. [Laurea magistrale], Università di Bologna, Corso di Studio in Automation engineering / ingegneria dell’automazione [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Reinforcement learning has demonstrated remarkable capability in com- plex control tasks, yet the trial-and-error nature of policy optimization offers no guarantee that the learned controller will respect safety constraints, either during training or at deployment. This thesis presents CALOS (Control- Affine System Lyapunov On-manifold Safety), a runtime safety layer that enforces attitude constraints on a quadrotor without modifying the under- lying learning algorithm. CALOS formulates four tilt-angle inequalities and a Lyapunov descent condition as a single quadratic programme whose solution is the minimum- norm correction to the nominal torque output of the policy. The QP is solved exactly via active-set enumeration over the three-dimensional torque space, at a per-step cost compatible with massively parallel simulation. The safety layer is composable with any policy gradient method; Proximal Policy Optimization is used throughout this work. Six agents are evaluated on three reference trajectories of increasing diffi- culty: an unconstrained PPO baseline, an ATACOM-like projection, a Lya- punov scaling operator, their cascaded composition, and the two CALOS variants (exact QP and projected approximation). CALOS-QP achieves zero attitude-constraint violations on the training trajectory and substan- tially reduces violation count and duration on harder, out-of-distribution conditions, while the Lyapunov and Cascade agents collapse under spawn- offset conditions due to a torque-clamping failure mode that CALOS avoids by design. A secondary finding is that projecting every training action onto the safe manifold induces an implicit curriculum: the policy backbone trained with the safety layer active outperforms the unconstrained baseline by 55–74% on lateral tracking error even when the layer is subsequently disabled, demonstrating that constraint projection accelerates convergence by restricting exploration to the dynamically recoverable region of the state space.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Cesareo, Fabrizio

Relatore della tesi

Acquaviva, Andrea

Correlatore della tesi

Mengozzi, Sebastiano ; Mimmo, Nicola

Scuola

Ingegneria e Architettura

Corso di studio