Imboccioli, Filippo
 
(2024)
Managing Data Complexity within a Big Data Platform through Data Management Solutions.
[Laurea magistrale], Università di Bologna, Corso di Studio in 
Artificial intelligence [LM-DM270], Documento full-text non disponibile
  
 
  
  
        
        
	
  
  
  
  
  
  
  
    
      Il full-text non è disponibile per scelta dell'autore.
      
        (
Contatta l'autore)
      
    
  
    
  
  
    
      Abstract
      Navigating the complex terrain of data lineage within extensive data lakes, especially in the nuanced corporate landscape of a financial institution, presents a formidable challenge. At the heart of this exploration is the integration of Apache Spark into the pipeline to trace internal components' data streams, areas where traditional lineage tools often reach their limits due to specific corporate technological constraints and requirements.
The crux of this research lies in its approach: a thorough scientific investigation culminating in the adoption of the Spline Spark Agent. This solution isn't just a mere adaptation but a groundbreaking response to the absence of adequate existing lineage mechanisms in the particular corporate context. By delving into the core principles of data lineage, this work pioneers the creation of low-level components, thus establishing a new Corporate Technological Standard for tracking data transformations and lineage.
This thesis transcends standard implementation, addressing a critical gap in the corporate environment and significantly advancing the domain of data management research. The integration and assessment of the Spline Spark Agent within the company's data infrastructure showcase not only a substantial leap forward in data governance and decision-making processes but also underscore the value of scientific inquiry in resolving real-world problems.
Occupying a unique nexus between theoretical research and practical application, this study underscores the indispensable role of bespoke data lineage solutions in navigating the complexities of modern corporate data management and governance. Through this contribution to the field of data management, the research illuminates a path forward for organizations grappling with similar challenges, emphasizing the critical intersection of innovation and utility in technological advancement.
     
    
      Abstract
      Navigating the complex terrain of data lineage within extensive data lakes, especially in the nuanced corporate landscape of a financial institution, presents a formidable challenge. At the heart of this exploration is the integration of Apache Spark into the pipeline to trace internal components' data streams, areas where traditional lineage tools often reach their limits due to specific corporate technological constraints and requirements.
The crux of this research lies in its approach: a thorough scientific investigation culminating in the adoption of the Spline Spark Agent. This solution isn't just a mere adaptation but a groundbreaking response to the absence of adequate existing lineage mechanisms in the particular corporate context. By delving into the core principles of data lineage, this work pioneers the creation of low-level components, thus establishing a new Corporate Technological Standard for tracking data transformations and lineage.
This thesis transcends standard implementation, addressing a critical gap in the corporate environment and significantly advancing the domain of data management research. The integration and assessment of the Spline Spark Agent within the company's data infrastructure showcase not only a substantial leap forward in data governance and decision-making processes but also underscore the value of scientific inquiry in resolving real-world problems.
Occupying a unique nexus between theoretical research and practical application, this study underscores the indispensable role of bespoke data lineage solutions in navigating the complexities of modern corporate data management and governance. Through this contribution to the field of data management, the research illuminates a path forward for organizations grappling with similar challenges, emphasizing the critical intersection of innovation and utility in technological advancement.
     
  
  
    
    
      Tipologia del documento
      Tesi di laurea
(Laurea magistrale)
      
      
      
      
        
      
        
          Autore della tesi
          Imboccioli, Filippo
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          Data Lineage,Apache Spark,Spline Framework,Data Lake,Data Governance
          
        
      
        
          Data di discussione della Tesi
          19 Marzo 2024
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di laurea
(NON SPECIFICATO)
      
      
      
      
        
      
        
          Autore della tesi
          Imboccioli, Filippo
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          Data Lineage,Apache Spark,Spline Framework,Data Lake,Data Governance
          
        
      
        
          Data di discussione della Tesi
          19 Marzo 2024
          
        
      
      URI
      
      
     
   
  
  
  
  
  
  
    
      Gestione del documento: 
      
        