Imboccioli, Filippo
(2024)
Managing Data Complexity within a Big Data Platform through Data Management Solutions.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
Navigating the complex terrain of data lineage within extensive data lakes, especially in the nuanced corporate landscape of a financial institution, presents a formidable challenge. At the heart of this exploration is the integration of Apache Spark into the pipeline to trace internal components' data streams, areas where traditional lineage tools often reach their limits due to specific corporate technological constraints and requirements.
The crux of this research lies in its approach: a thorough scientific investigation culminating in the adoption of the Spline Spark Agent. This solution isn't just a mere adaptation but a groundbreaking response to the absence of adequate existing lineage mechanisms in the particular corporate context. By delving into the core principles of data lineage, this work pioneers the creation of low-level components, thus establishing a new Corporate Technological Standard for tracking data transformations and lineage.
This thesis transcends standard implementation, addressing a critical gap in the corporate environment and significantly advancing the domain of data management research. The integration and assessment of the Spline Spark Agent within the company's data infrastructure showcase not only a substantial leap forward in data governance and decision-making processes but also underscore the value of scientific inquiry in resolving real-world problems.
Occupying a unique nexus between theoretical research and practical application, this study underscores the indispensable role of bespoke data lineage solutions in navigating the complexities of modern corporate data management and governance. Through this contribution to the field of data management, the research illuminates a path forward for organizations grappling with similar challenges, emphasizing the critical intersection of innovation and utility in technological advancement.
Abstract
Navigating the complex terrain of data lineage within extensive data lakes, especially in the nuanced corporate landscape of a financial institution, presents a formidable challenge. At the heart of this exploration is the integration of Apache Spark into the pipeline to trace internal components' data streams, areas where traditional lineage tools often reach their limits due to specific corporate technological constraints and requirements.
The crux of this research lies in its approach: a thorough scientific investigation culminating in the adoption of the Spline Spark Agent. This solution isn't just a mere adaptation but a groundbreaking response to the absence of adequate existing lineage mechanisms in the particular corporate context. By delving into the core principles of data lineage, this work pioneers the creation of low-level components, thus establishing a new Corporate Technological Standard for tracking data transformations and lineage.
This thesis transcends standard implementation, addressing a critical gap in the corporate environment and significantly advancing the domain of data management research. The integration and assessment of the Spline Spark Agent within the company's data infrastructure showcase not only a substantial leap forward in data governance and decision-making processes but also underscore the value of scientific inquiry in resolving real-world problems.
Occupying a unique nexus between theoretical research and practical application, this study underscores the indispensable role of bespoke data lineage solutions in navigating the complexities of modern corporate data management and governance. Through this contribution to the field of data management, the research illuminates a path forward for organizations grappling with similar challenges, emphasizing the critical intersection of innovation and utility in technological advancement.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Imboccioli, Filippo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Data Lineage,Apache Spark,Spline Framework,Data Lake,Data Governance
Data di discussione della Tesi
19 Marzo 2024
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Imboccioli, Filippo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Data Lineage,Apache Spark,Spline Framework,Data Lake,Data Governance
Data di discussione della Tesi
19 Marzo 2024
URI
Gestione del documento: