AutoML: A new methodology to automate data pre-processing pipelines

Giovanelli, Joseph (2020) AutoML: A new methodology to automate data pre-processing pipelines. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 3.0 (CC BY-NC-ND 3.0)
Download (6MB)

Abstract

It is well known that we are living in the Big Data Era. Indeed, the exponential growth of Internet of Things, Web of Things and Pervasive Computing systems greatly increased the amount of stored data. Thanks to the availability of data, the figure of the Data Scientist has become one of the most sought, because he is capable of transforming data, performing analysis on it, and applying Machine Learning techniques to improve the business decisions of companies. Yet, Data Scientists do not scale. It is almost impossible to balance their number and the required effort to analyze the increasingly growing sizes of available data. Furthermore, today more and more non-experts use Machine Learning tools to perform data analysis but they do not have the required knowledge. To this end, tools that help them throughout the Machine Learning process have been developed and are typically referred to as AutoML tools. However, even with the presence of such tools, raw data (i.e., without being pre-processed) are rarely ready to be consumed, and generally perform poorly when consumed in a raw form. A pre-processing phase (i.e., application of a set of transformations), which improves the quality of the data and makes it suitable for algorithms is usually required. Most of AutoML tools do not consider this preliminary part, even though it has already shown to improve the final performance. Moreover, there exist a few works that actually support pre-processing, but they provide just the application of a fixed series of transformations, decided a priori, not considering the nature of the data, the used algorithm, or simply that the order of the transformations could affect the final result. In this thesis we propose a new methodology that allows to provide a series of pre-processing transformations according to the specific presented case. Our approach analyzes the nature of the data, the algorithm we intend to use, and the impact that the order of transformations could have.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Giovanelli, Joseph

Relatore della tesi

Golfarelli, Matteo

Scuola

Ingegneria e Architettura

Corso di studio

Ingegneria e scienze informatiche [LM-DM270] - Cesena

Ordinamento Cds

DM270

Parole chiave

machine learning,automade machine learning,data mining,data pre-processing,data preparation,artificial intelligence,AutoML,pre-processing,DPSO,automade data pre-processing,automade data preparation,data science

Data di discussione della Tesi

19 Marzo 2020

URI

https://amslaurea.unibo.it/id/eprint/20422

Altri metadati

Statistica sui download

Vedi altre statistiche

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

AutoML: A new methodology to automate data pre-processing pipelines

Abstract

Altri metadati

Statistica sui download