Evaluating Large Language Models for Dimensional Fact Model Design with Automated Pipelines

Rubboli, Luca (2025) Evaluating Large Language Models for Dimensional Fact Model Design with Automated Pipelines. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)

Download (2MB)

Abstract

This work investigates the use of large language models for conceptual design of multidimensional data warehouses, comparing supply-driven and demand-driven approaches. In the supply-driven approach, Dimensional Fact Model schemata is generated from source relational schemas, whereas in the demand-driven approach, schemata is generated from textual end-user requirements. Multiple LLMs are evaluated, including GPT, LLaMA, Falcon and Mistral, using automated pipelines for YAML-based schema extraction, metrics computation and visualization. Eval- uation metrics include node- and edge-level precision, recall and F1-score, as well as custom error metrics reflecting domain-specific schema errors. Experiments are run on CPU and GPU environments, with automated scripts ensuring repro- ducibility and consistent execution across multiple runs. Results show that prompt engineering significantly improves model performance: for supply-driven design, average F1-scores nearly double, while for demand-driven design, careful prompt design increases scores by up to 20%. GPT-5 demonstrates slight improvements over GPT-4, particularly in capturing relational dependencies. The study also highlights practical limitations, including memory constraints with larger import models, variability in execution times and the need for manual post-processing rules. Future work includes expanding the exercise dataset, developing automated alignment strategies, exploring interactive multi-turn schema design and experi- menting with fine-tuning large import models to enhance both accuracy and ef- ficiency. These results provide a systematic foundation for leveraging LLMs in automated data warehouse conceptual design, balancing effectiveness and compu- tational resources.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Rubboli, Luca
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
LLMs,DFM,Conceptual Modeling,Business Intelligence,Supply driven,demand driven,prompt engineering
Data di discussione della Tesi
2 Ottobre 2025
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^