Turra, Riccardo
(2023)
Exploring robustness to viewpoint changes by creating a dataset of simulated dashcam videos.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Ingegneria informatica [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
Abstract
The issue of changing viewpoints in driving scenarios introduces significant
challenges for the performance and robustness of deep learning models trained
to solve important tasks such as monocular depth estimation or semantic Bird’s
Eye View (BEV) prediction. This research investigates the impact of vari-
ous camera positions on model predictions, focusing on the non-trivial task of
generating BEV semantic segmentation images from a monocular perspective
input.
To address the lack of suitable datasets in the literature, we created our
own, featuring diverse dashcam-like acquisitions sampled from eight different
viewpoints and counting, in total, 112,000 images with corresponding annota-
tions. The evaluations we performed highlighted significant performance gaps
(-34.15%) across common dashcam placements. Additionally, we proposed a
solution to enhance model robustness, involving training with multiple camera
poses, resulting in significant improvements (+16.39%) over baseline perfor-
mance. In assessing the simulation-to-reality domain gap, we found a sub-
stantial decrease (-70.35%) in model performance when comparing the results
obtained on synthetic data with a comparable real dataset.
In conclusion, this project contributes valuable insights into the challenges
posed by varying camera positions in driving scenarios and the utilization of
synthetically trained models on real data. The proposed solution enhances
model robustness across both seen and unseen viewpoints, contributing to the
advancement of vision models in the context of road safety and image-based
applications for driving scenarios.
Abstract
The issue of changing viewpoints in driving scenarios introduces significant
challenges for the performance and robustness of deep learning models trained
to solve important tasks such as monocular depth estimation or semantic Bird’s
Eye View (BEV) prediction. This research investigates the impact of vari-
ous camera positions on model predictions, focusing on the non-trivial task of
generating BEV semantic segmentation images from a monocular perspective
input.
To address the lack of suitable datasets in the literature, we created our
own, featuring diverse dashcam-like acquisitions sampled from eight different
viewpoints and counting, in total, 112,000 images with corresponding annota-
tions. The evaluations we performed highlighted significant performance gaps
(-34.15%) across common dashcam placements. Additionally, we proposed a
solution to enhance model robustness, involving training with multiple camera
poses, resulting in significant improvements (+16.39%) over baseline perfor-
mance. In assessing the simulation-to-reality domain gap, we found a sub-
stantial decrease (-70.35%) in model performance when comparing the results
obtained on synthetic data with a comparable real dataset.
In conclusion, this project contributes valuable insights into the challenges
posed by varying camera positions in driving scenarios and the utilization of
synthetically trained models on real data. The proposed solution enhances
model robustness across both seen and unseen viewpoints, contributing to the
advancement of vision models in the context of road safety and image-based
applications for driving scenarios.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Turra, Riccardo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Computer vision,Sim-to-real,Driving scenario,Bird's Eye View,Deep Neural Network,Viewpoint evaluation,Synthetic data
Data di discussione della Tesi
16 Dicembre 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Turra, Riccardo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Computer vision,Sim-to-real,Driving scenario,Bird's Eye View,Deep Neural Network,Viewpoint evaluation,Synthetic data
Data di discussione della Tesi
16 Dicembre 2023
URI
Gestione del documento: