Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
This thesis is a documentation of the work resulting of an internship in Datalogic USA Inc. in Pasadena, California, of the duration of six months.
The developed project had the purpose of designing a system for detecting text lines within high resolution images in an industrial framework, making use of Deep Learning and convolutional neural networks, focusing on the use of a general objects detection system in the context of optical character recognition. The chosen general purpose object detector was YOLO, currently providing state-of-the-art performances in terms of trade-off between speed and accuracy.
The goal of the thesis work was to configure and specialize a general object detection convolutional neural network in such a way to optimize its performances for the purpose of optical character recognition.
After laying down the theoretical bases, the specific object detection system (YOLO) was mastered, from the architecture of the network, to the structure of output and loss function.
The same neural network framework as for the original implementation of YOLO was used, called Darknet.
Darknet consists of a system for building, training and testing neural networks written in C, CUDA and featuring OpenCV libraries. Part of the thesis work consisted in gaining deep knowledge of the code and enhancing it with additional features.
New solutions were proposed to maximize accuracy on the given datasets and solve technology-related problems that were impairing performances in some instances.
It resulted that YOLO is impressively fast, providing a very large speedup with respect to the current OCR solution used by Datalogic.
It is very accurate as long as its training set features enough variability. On the other hand, it struggles at generalizing on unknown patterns.
Abstract
This thesis is a documentation of the work resulting of an internship in Datalogic USA Inc. in Pasadena, California, of the duration of six months.
The developed project had the purpose of designing a system for detecting text lines within high resolution images in an industrial framework, making use of Deep Learning and convolutional neural networks, focusing on the use of a general objects detection system in the context of optical character recognition. The chosen general purpose object detector was YOLO, currently providing state-of-the-art performances in terms of trade-off between speed and accuracy.
The goal of the thesis work was to configure and specialize a general object detection convolutional neural network in such a way to optimize its performances for the purpose of optical character recognition.
After laying down the theoretical bases, the specific object detection system (YOLO) was mastered, from the architecture of the network, to the structure of output and loss function.
The same neural network framework as for the original implementation of YOLO was used, called Darknet.
Darknet consists of a system for building, training and testing neural networks written in C, CUDA and featuring OpenCV libraries. Part of the thesis work consisted in gaining deep knowledge of the code and enhancing it with additional features.
New solutions were proposed to maximize accuracy on the given datasets and solve technology-related problems that were impairing performances in some instances.
It resulted that YOLO is impressively fast, providing a very large speedup with respect to the current OCR solution used by Datalogic.
It is very accurate as long as its training set features enough variability. On the other hand, it struggles at generalizing on unknown patterns.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Dolci, Beatrice
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Computer vision,deep learning,machine learning,neural networks,artificial intelligence,OCR,optical character recognition
Data di discussione della Tesi
15 Marzo 2019
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Dolci, Beatrice
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Computer vision,deep learning,machine learning,neural networks,artificial intelligence,OCR,optical character recognition
Data di discussione della Tesi
15 Marzo 2019
URI
Gestione del documento: