Rovinelli, Marco
(2021)
Realtime Monocular Depth Estimation on Mobile Phones.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
Abstract
Depth estimation is a necessary task to understand and navigate the environment around us. Over the years, many active sensors have been developed to measure depth but they are expensive and require additional space to be mounted. A cheaper alternative consists of estimating depth maps using images taken by a mobile phone camera. Since most mobile phones don't have cameras built for stereo depth sensing, it would be ideal to be able to recover depth from a single image using only the computational capability of the mobile phone itself. This can be achieved by training a neural network on ground truth depth maps. This type of data is very expensive to obtain so it's preferred to train the neural network using self-supervision from multiple images. Since the devices where the trained models will be deployed have only one camera, it is ideal to train the network on monocular videos representing the actual data distribution at deployment. Self-supervised training using monocular videos lowers the accuracy of the depth maps and brings the additional challenge of being able to predict depth only up to an unknown scale factor. To this end, additional information, velocity provided by the GPS, and sparse points computed by a monocular SLAM algorithm, are employed to recover scale and improve the accuracy. This study will investigate different neural network architectures and training schemes to achieve depth maps as accurately as possible given the constraints of the computational budget available on modern mobile phones.
Abstract
Depth estimation is a necessary task to understand and navigate the environment around us. Over the years, many active sensors have been developed to measure depth but they are expensive and require additional space to be mounted. A cheaper alternative consists of estimating depth maps using images taken by a mobile phone camera. Since most mobile phones don't have cameras built for stereo depth sensing, it would be ideal to be able to recover depth from a single image using only the computational capability of the mobile phone itself. This can be achieved by training a neural network on ground truth depth maps. This type of data is very expensive to obtain so it's preferred to train the neural network using self-supervision from multiple images. Since the devices where the trained models will be deployed have only one camera, it is ideal to train the network on monocular videos representing the actual data distribution at deployment. Self-supervised training using monocular videos lowers the accuracy of the depth maps and brings the additional challenge of being able to predict depth only up to an unknown scale factor. To this end, additional information, velocity provided by the GPS, and sparse points computed by a monocular SLAM algorithm, are employed to recover scale and improve the accuracy. This study will investigate different neural network architectures and training schemes to achieve depth maps as accurately as possible given the constraints of the computational budget available on modern mobile phones.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Rovinelli, Marco
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Monocular depth estimation,self supervision,slam,odometry,depth estimation
Data di discussione della Tesi
8 Ottobre 2021
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Rovinelli, Marco
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Monocular depth estimation,self supervision,slam,odometry,depth estimation
Data di discussione della Tesi
8 Ottobre 2021
URI
Statistica sui download
Gestione del documento: