Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
Human visual perception is a powerful tool to let us interact with the world, interpreting depth using both physiological and psychological cues. In the early days, machine vision was primarily inspired by physiological cues, guiding robots with bulky sensors based on focal length adjustments, pattern matching, and binocular disparity. In reality, however, we always get a certain degree of depth sensation from the monocular image reproduced on the retina, which is judged by our brain upon empirical grounds. With the advent of deep learning techniques, estimating depth from a monocular image has became a major research topic. Currently, it is still far from industrial use, as the estimated depth is valid only up to a scale factor, leaving us with relative depth information. We propose an algorithm to estimate the depth of a scene at the actual global scale, leveraging geometric constraints and state-of-the-art techniques in optical flow and depth estimation. We first compute the three-dimensional information of multiple similar scenes, triangulating multi-view images for which dense correspondences have been estimated by an Optical Flow Estimation network. Then we train a Monocular Depth Estimation network on the precomputed multiple scenes to learn their similarities, like objects sizes, and ignore their differences, like objects arrangements. Experimental results suggest that our method is able to learn to estimate metric depth of a novel similar scene, opening the possibility to perform Robot Guidance using an affordable, light and compact smartphone camera as depth sensor.
Abstract
Human visual perception is a powerful tool to let us interact with the world, interpreting depth using both physiological and psychological cues. In the early days, machine vision was primarily inspired by physiological cues, guiding robots with bulky sensors based on focal length adjustments, pattern matching, and binocular disparity. In reality, however, we always get a certain degree of depth sensation from the monocular image reproduced on the retina, which is judged by our brain upon empirical grounds. With the advent of deep learning techniques, estimating depth from a monocular image has became a major research topic. Currently, it is still far from industrial use, as the estimated depth is valid only up to a scale factor, leaving us with relative depth information. We propose an algorithm to estimate the depth of a scene at the actual global scale, leveraging geometric constraints and state-of-the-art techniques in optical flow and depth estimation. We first compute the three-dimensional information of multiple similar scenes, triangulating multi-view images for which dense correspondences have been estimated by an Optical Flow Estimation network. Then we train a Monocular Depth Estimation network on the precomputed multiple scenes to learn their similarities, like objects sizes, and ignore their differences, like objects arrangements. Experimental results suggest that our method is able to learn to estimate metric depth of a novel similar scene, opening the possibility to perform Robot Guidance using an affordable, light and compact smartphone camera as depth sensor.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Toschi, Marco
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Deep Learning,Monocular Depth Estimation,Optical Flow,Human Visual Perception,Robotic Guidance,Machine Vision,Computer Vision
Data di discussione della Tesi
7 Ottobre 2021
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Toschi, Marco
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Deep Learning,Monocular Depth Estimation,Optical Flow,Human Visual Perception,Robotic Guidance,Machine Vision,Computer Vision
Data di discussione della Tesi
7 Ottobre 2021
URI
Gestione del documento: