Presepi, Alex
(2026)
Seamless smart home interaction via pose-based gesture recognition on edge devices.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
Abstract
Gesture-based interfaces represent a promising alternative to traditional smart home interaction paradigms, offering silent, privacy-preserving, and wearable-free control. This thesis investigates the design and deployment of a vision-based gesture recognition system for home automation, targeting low-power edge devices, complete darkness via infrared imaging, and multi-person domestic scenarios. The work follows a progressive refinement methodology. Starting from a 3D spatial interaction paradigm, where device selection is resolved through ray-casting from estimated body joints, we first explored passive stereo vision, which proved insufficient due to temporal jitter and triangulation errors. Transitioning to active depth sensing via the Kinect v1 and domain-adapting the A2J pose estimation model yielded a functional 3D system. However, self-occlusion, inability to discriminate closely placed targets, and high computational demands motivated a paradigm shift towards a lighter 2D state-based model. Control is decomposed into sequential selection and action steps. A first 2D solution based on YOLO pose, fine-tuned on a custom dataset covering low-light conditions and unconventional poses, uses arm-raising as the interaction trigger and achieves strong performance on a Raspberry Pi via NCNN export. The final system replaces arm-raising with hand gesture classification: colour-agnostic augmentation enables infrared generalization, while a two-stage pipeline, wide-area search followed by target tracking, sustains real-time performance at room scale on a Raspberry Pi 5. Across all tested solutions, domain adaptation emerged as the critical bottleneck, as off-the-shelf models consistently failed under the combination of infrared illumination, wide field-of-view optics, and residential deployment conditions.
Abstract
Gesture-based interfaces represent a promising alternative to traditional smart home interaction paradigms, offering silent, privacy-preserving, and wearable-free control. This thesis investigates the design and deployment of a vision-based gesture recognition system for home automation, targeting low-power edge devices, complete darkness via infrared imaging, and multi-person domestic scenarios. The work follows a progressive refinement methodology. Starting from a 3D spatial interaction paradigm, where device selection is resolved through ray-casting from estimated body joints, we first explored passive stereo vision, which proved insufficient due to temporal jitter and triangulation errors. Transitioning to active depth sensing via the Kinect v1 and domain-adapting the A2J pose estimation model yielded a functional 3D system. However, self-occlusion, inability to discriminate closely placed targets, and high computational demands motivated a paradigm shift towards a lighter 2D state-based model. Control is decomposed into sequential selection and action steps. A first 2D solution based on YOLO pose, fine-tuned on a custom dataset covering low-light conditions and unconventional poses, uses arm-raising as the interaction trigger and achieves strong performance on a Raspberry Pi via NCNN export. The final system replaces arm-raising with hand gesture classification: colour-agnostic augmentation enables infrared generalization, while a two-stage pipeline, wide-area search followed by target tracking, sustains real-time performance at room scale on a Raspberry Pi 5. Across all tested solutions, domain adaptation emerged as the critical bottleneck, as off-the-shelf models consistently failed under the combination of infrared illumination, wide field-of-view optics, and residential deployment conditions.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Presepi, Alex
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Human Pose Estimation, Gesture Recognition, Edge Deployment, Depth Sensing, Domain Adaptation, Home Automation, Computer Vision, Stereo Vision, human-computer interaction, hand gesture recognition, convolutional neural networks, CLAHE, Kinect, smart home
Data di discussione della Tesi
26 Marzo 2026
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Presepi, Alex
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Human Pose Estimation, Gesture Recognition, Edge Deployment, Depth Sensing, Domain Adaptation, Home Automation, Computer Vision, Stereo Vision, human-computer interaction, hand gesture recognition, convolutional neural networks, CLAHE, Kinect, smart home
Data di discussione della Tesi
26 Marzo 2026
URI
Gestione del documento: