Engineering Data Pipelines and Analytics with DataOps

Folin, Veronika (2024) Engineering Data Pipelines and Analytics with DataOps. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena, Documento ad accesso riservato.
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (6MB) | Contatta l'autore

Abstract

Nowadays, organizations are grappling with the challenge of effectively using Big Data to make data-driven decisions. However, access to reliable datasets is crucial, as their absence can lead to incomplete or erroneous business insights, resulting in misguided conclusions. Consequently, companies are moving toward the adoption of new methodologies and advanced tools for data management to enhance process trustworthiness. In this context, concepts such as DataOps and Analytics Engineering, and tools like dbt, are gaining popularity. DataOps draws inspiration from DevOps and agile methodologies to accelerate data delivery. Analytics Engineering is an emerging discipline that focuses on ensuring clean, tested, and well-documented data. dbt is a newly open-source command line tool, designed to assist analytics engineers in enhancing the efficiency of data transformation in their data warehouse while adhering to the top standards of software engineering. This thesis aims to investigate how dbt simplifies the implementation of DataOps principles while building a data pipeline. The proposed solution spans the entire spectrum of a data platform, from initial ingestion to analysis, through a reliable transformation process. Specifically, we improve the level of automation of data processes and make model development and use of cloud resources more effective. Furthermore, our attention falls on monitoring data quality and applying data governance principles. Overall, this solution can be considered as a starting point for managing datasets in real production environments.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Folin, Veronika
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Business Intelligence,DataOps,Analytics Engineering,dbt,ELT,Data Warehouse
Data di discussione della Tesi
15 Marzo 2024
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^