Seamless smart home interaction via body pose or hand gesture recognition on edge devices

Sciarrillo, Alessandro (2026) Seamless smart home interaction via body pose or hand gesture recognition on edge devices. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text non accessibile fino al 1 Gennaio 2031.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (21MB) | Contatta l'autore

Abstract

The integration of computer vision into smart homes offers intuitive, gesture-based interaction paradigms. However, deploying these systems on low-power edge devices presents significant challenges. This thesis explores a progressive trajectory of interaction modalities, moving from 3D body pose estimation to 2D hand gesture recognition, driven by structural limitations encountered in real-world deployment. Initially, 3D spatial interaction via passive stereo vision is investigated. Pixel-level variance in 2D keypoints propagates through DLT triangulation, causing skeletal jitter incompatible with reliable pointing, while stereo calibration proves mechanically fragile. Transitioning to active depth sensors mitigates temporal instability but introduces computational costs prohibitive for edge deployment, and domain gaps that degrade accuracy on elbows and wrists. To enable real-time execution on edge devices, the paradigm shifts to 2D perception. While 2D arm-raising provides robust interaction, it is hindered by ergonomic effort and body occlusions. Consequently, the focus shifts to 2D hand gesture recognition. Primitive hand gestures offer highly discriminative features independent of full-body visibility. The proposed pipeline, trained on a large-scale gesture dataset, employs an illumination-invariant, color-agnostic augmentation strategy to achieve robust generalization under both visible light and complete darkness via infrared illumination. By implementing an adaptive dual-mode inference strategy, alternating between a high-resolution wide-area search and an optimized lower-resolution high-frequency tracking phase, the system guarantees broad spatial coverage and deterministic latency. This approach delivers fluid performance on a standalone Raspberry Pi 5. Ultimately, this work presents a robust, computationally efficient, and concretely deployable solution for seamless smart home interaction.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Sciarrillo, Alessandro
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
edge computing, computer vision, gesture recognition, hand gesture recognition, gesture detection, pose estimation, human pose estimation, HPE, smart home, home automation, YOLO, YOLO11, deep learning, convolutional neural networks, domain adaptation, infrared imaging, NoIR camera, infrared, IR, Raspberry Pi, embedded systems, single-board computer, stereo vision, depth sensing, depth, Kinect, CLAHE, data augmentation, real-time inference, human-computer interaction, IoT
Data di discussione della Tesi
26 Marzo 2026
URI

Altri metadati

Gestione del documento: Visualizza il documento

^