Evaluation of Synthetic Data's Impact on Financial Predictive Models

Conca, Edoardo (2025) Evaluation of Synthetic Data's Impact on Financial Predictive Models. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Synthetic data generation has emerged as a promising solution to overcome data scarcity, privacy constraints, and class imbalance in high-stakes domains such as finance. This thesis investigates the use of generative models to create synthetic tabular data tailored for predictive tasks including credit scoring, fraud detection, and loan default classification. The study focuses on evaluating whether synthetic data can effectively substitute or complement real datasets in machine learning pipelines without compromising performance or regulatory compliance. The methodology involves training and tuning state-of-the-art generative models across three experimental scenarios: synthetic-only training, minority class augmentation, and statistical-predictive benchmarking. Each scenario is assessed using a combination of traditional classification metrics (Precision, Recall, F1-score, ROC AUC. etc.) and synthetic-specific utility and similarity metrics (e.g., KSComplement, TVComplement, BinaryClassifierEfficacy). Results show that, under carefully optimized conditions, synthetic data can achieve comparable predictive performance to real data while enhancing fairness and protecting privacy. Augmenting real datasets with targeted synthetic samples proves especially effective in mitigating class imbalance. However, the quality of the synthetic data is highly sensitive to the structure and pre-processing of the original dataset. The thesis concludes that while synthetic data is not a universal substitute, it is a valuable and increasingly mature tool for enriching real-world financial datasets, enabling responsible and scalable machine learning applications in regulated environments.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Conca, Edoardo

Relatore della tesi

Salti, Samuele

Correlatore della tesi

Pini, Michele

Scuola

Ingegneria e Architettura

Corso di studio