Lamberti, Lorenzo
(2019)
A deep learning solution for industrial OCR applications.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Ingegneria elettronica [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
|
Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (29MB)
| Contatta l'autore
|
Abstract
This thesis describes a project developed throughout a six months internship in the Machine Vision Laboratory of Datalogic based in Pasadena, California.
The project aims to develop a deep learning system as a possible solution for industrial optical character recognition applications. In particular, the focus falls on a specific algorithm called You Only Look Once (YOLO), which is a general-purpose object detector based on convolutional neural networks that currently offers state-of-the-art performances in terms of trade-off between speed and accuracy. This algorithm is indeed well known for reaching impressive processing speeds, but its intrinsic structure makes it struggle in detecting small objects clustered together, which unfortunately matches our scenario: we are trying to read alphanumerical codes by detecting each single character and then reconstructing the final string. The final goal of this thesis is to overcome this drawback and push the accuracy performances of a general object detector convolutional neural network to its limits, in order to meet the demanding requirements of industrial OCR applications. To accomplish this, first YOLO's unique detecting approach was mastered in its original framework called Darknet, written in C and CUDA, then all the code was translated into Python programming language for a better flexibility, which also allowed the deployment of a custom architecture. Four different datasets with increasing complexity were used as case-studies and the final performances reached were surprising: the accuracy varies between 99.75\% and 99.97\% with a processing time of 15 ms for images $1000\times1000$ big, largely outperforming in speed the current deep learning solution deployed by Datalogic. On the downsides, the training phase usually requires a very large amount of data and time and YOLO also showed some memorization behaviours if not enough variability is given at training time.
Abstract
This thesis describes a project developed throughout a six months internship in the Machine Vision Laboratory of Datalogic based in Pasadena, California.
The project aims to develop a deep learning system as a possible solution for industrial optical character recognition applications. In particular, the focus falls on a specific algorithm called You Only Look Once (YOLO), which is a general-purpose object detector based on convolutional neural networks that currently offers state-of-the-art performances in terms of trade-off between speed and accuracy. This algorithm is indeed well known for reaching impressive processing speeds, but its intrinsic structure makes it struggle in detecting small objects clustered together, which unfortunately matches our scenario: we are trying to read alphanumerical codes by detecting each single character and then reconstructing the final string. The final goal of this thesis is to overcome this drawback and push the accuracy performances of a general object detector convolutional neural network to its limits, in order to meet the demanding requirements of industrial OCR applications. To accomplish this, first YOLO's unique detecting approach was mastered in its original framework called Darknet, written in C and CUDA, then all the code was translated into Python programming language for a better flexibility, which also allowed the deployment of a custom architecture. Four different datasets with increasing complexity were used as case-studies and the final performances reached were surprising: the accuracy varies between 99.75\% and 99.97\% with a processing time of 15 ms for images $1000\times1000$ big, largely outperforming in speed the current deep learning solution deployed by Datalogic. On the downsides, the training phase usually requires a very large amount of data and time and YOLO also showed some memorization behaviours if not enough variability is given at training time.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Lamberti, Lorenzo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Deep Leanrning,Convolutional Neural Networks,Optical Character Recognition,Object Detection,You Only Look Once,YOLO,Image Processing,Computer Vision,Industrial OCR,Artificial Intelligence
Data di discussione della Tesi
19 Dicembre 2019
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Lamberti, Lorenzo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Deep Leanrning,Convolutional Neural Networks,Optical Character Recognition,Object Detection,You Only Look Once,YOLO,Image Processing,Computer Vision,Industrial OCR,Artificial Intelligence
Data di discussione della Tesi
19 Dicembre 2019
URI
Statistica sui download
Gestione del documento: