Fratus, Marta
(2023)
Working with big data from ingestion to prediction:
an experimental approach on air pollution ARPA data.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Matematica [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
This thesis initiates a comprehensive exploration of environmental data provided by ARPAE, employing a structured approach to data processing, analytics, and predictive modeling. The
primary objective is to clarify the complexities of environmental quality, spanning from data collection and cleansing to in-depth analysis and future forecasting. The initial chapter provides a detailed overview of the Extract, Transform, Load (ETL) processes, explaining the theoretical framework behind these processes, the issues encountered and the solutions applied. Talend Open Studio for Data Integration is introduced along with its components, showcasing their role in transforming raw ARPAE data into a structured and usable format. Additionally, DBeaver is presented as a database management tool facilitating data organization. Then, all the tables created on the database and the jobs are shown in detail.
In Chapter 2 the focus shifts to data analytics, where Power BI takes center stage. The aim of this chapter is to visualize and analyze the data collected in the previous one. The creation
of informative dashboards becomes pivotal, visually representing trends in key environmental parameters. We carefully examine the data, paying close attention to whether the environmental
data complies with legal standards.
The final chapter elevates the exploration to predictive analysis, introducing linear regression, ETS, and ARIMA models as tools for forecasting future environmental data based on historical
information. These models are applied specifically to critical parameters such as PM10, PM2.5 and O3, aiming to predict their values for the year 2023. Subsequently, we compare these predictions with the available partial data from 2023. By integrating technical methodologies, analytical insights, and predictive capabilities, this thesis aims to contribute to a rich and detailed understanding of both historical trends and potential future trajectories in environmental quality.
Abstract
This thesis initiates a comprehensive exploration of environmental data provided by ARPAE, employing a structured approach to data processing, analytics, and predictive modeling. The
primary objective is to clarify the complexities of environmental quality, spanning from data collection and cleansing to in-depth analysis and future forecasting. The initial chapter provides a detailed overview of the Extract, Transform, Load (ETL) processes, explaining the theoretical framework behind these processes, the issues encountered and the solutions applied. Talend Open Studio for Data Integration is introduced along with its components, showcasing their role in transforming raw ARPAE data into a structured and usable format. Additionally, DBeaver is presented as a database management tool facilitating data organization. Then, all the tables created on the database and the jobs are shown in detail.
In Chapter 2 the focus shifts to data analytics, where Power BI takes center stage. The aim of this chapter is to visualize and analyze the data collected in the previous one. The creation
of informative dashboards becomes pivotal, visually representing trends in key environmental parameters. We carefully examine the data, paying close attention to whether the environmental
data complies with legal standards.
The final chapter elevates the exploration to predictive analysis, introducing linear regression, ETS, and ARIMA models as tools for forecasting future environmental data based on historical
information. These models are applied specifically to critical parameters such as PM10, PM2.5 and O3, aiming to predict their values for the year 2023. Subsequently, we compare these predictions with the available partial data from 2023. By integrating technical methodologies, analytical insights, and predictive capabilities, this thesis aims to contribute to a rich and detailed understanding of both historical trends and potential future trajectories in environmental quality.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Fratus, Marta
Relatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM ADVANCED MATHEMATICS FOR APPLICATIONS
Ordinamento Cds
DM270
Parole chiave
pollution,data integration,data analytics,data visualization,predictive models,ETL,ARPAE
Data di discussione della Tesi
22 Dicembre 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Fratus, Marta
Relatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM ADVANCED MATHEMATICS FOR APPLICATIONS
Ordinamento Cds
DM270
Parole chiave
pollution,data integration,data analytics,data visualization,predictive models,ETL,ARPAE
Data di discussione della Tesi
22 Dicembre 2023
URI
Gestione del documento: