Handling Data Imbalance and Bias Propagation in Machine Learning for Building Stock Analysis

Gaiani, Giacomo (2025) Handling Data Imbalance and Bias Propagation in Machine Learning for Building Stock Analysis. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

This thesis addresses two major challenges in large-scale machine learning applications for building stock analysis: class imbalance and bias propagation in sequential ML pipelines. Accurate characterization of building attributes, such as type, size, and construction year, is essential for various domains including disaster risk assessment, urban planning, and energy modeling. However, official data sources are often incomplete or inconsistent, prompting the use of machine learning to infer missing information. This work proposes and evaluates a combined methodology leveraging data balancing techniques (oversampling, undersampling, hybrid approaches) with bootstrap aggregating (bagging), and model calibration to improve the prediction quality of minority classes and reduce bias amplification caused by synthetic data in sequential pipelines. Experiments conducted on three real-world datasets show notable improvements in F1-macro scores for minority classes without sacrificing overall accuracy. Additionally, post-training calibration and bagging strategies demonstrated enhanced robustness to synthetic data. These results highlight the importance of context-aware approaches and lay the groundwork for the development and application of data systems such as ETHOS.BUILDA.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Gaiani, Giacomo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
machine learninig, building stock analysis, class imbalance, bias propagation, model calibration
Data di discussione della Tesi
22 Luglio 2025
URI

Altri metadati

Gestione del documento: Visualizza il documento

^