AutoML: A new methodology to automate data pre-processing pipelines

Giovanelli, Joseph (2020) AutoML: A new methodology to automate data pre-processing pipelines. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 3.0 (CC BY-NC-ND 3.0)

Download (6MB)

Abstract

It is well known that we are living in the Big Data Era. Indeed, the exponential growth of Internet of Things, Web of Things and Pervasive Computing systems greatly increased the amount of stored data. Thanks to the availability of data, the figure of the Data Scientist has become one of the most sought, because he is capable of transforming data, performing analysis on it, and applying Machine Learning techniques to improve the business decisions of companies. Yet, Data Scientists do not scale. It is almost impossible to balance their number and the required effort to analyze the increasingly growing sizes of available data. Furthermore, today more and more non-experts use Machine Learning tools to perform data analysis but they do not have the required knowledge. To this end, tools that help them throughout the Machine Learning process have been developed and are typically referred to as AutoML tools. However, even with the presence of such tools, raw data (i.e., without being pre-processed) are rarely ready to be consumed, and generally perform poorly when consumed in a raw form. A pre-processing phase (i.e., application of a set of transformations), which improves the quality of the data and makes it suitable for algorithms is usually required. Most of AutoML tools do not consider this preliminary part, even though it has already shown to improve the final performance. Moreover, there exist a few works that actually support pre-processing, but they provide just the application of a fixed series of transformations, decided a priori, not considering the nature of the data, the used algorithm, or simply that the order of the transformations could affect the final result. In this thesis we propose a new methodology that allows to provide a series of pre-processing transformations according to the specific presented case. Our approach analyzes the nature of the data, the algorithm we intend to use, and the impact that the order of transformations could have.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Giovanelli, Joseph
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
machine learning,automade machine learning,data mining,data pre-processing,data preparation,artificial intelligence,AutoML,pre-processing,DPSO,automade data pre-processing,automade data preparation,data science
Data di discussione della Tesi
19 Marzo 2020
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^