Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
This thesis focuses on the development and assessment of a comprehensive pipeline aimed at extracting Environmental, Social, and Governance (ESG) key performance
indicators (KPIs) from corporate websites and sustainability reports. The proposed pipeline integrates web discovery, a Large Language Model (LLM) for information extraction, and a deterministic post-processing mechanism to standardize noisy
outputs into structured, analysis-ready tables. A case study carried out at Illuminem is used to evaluate field-level accuracy, with an exploration of challenges stemming
from unstructured data formats, variability in model outputs, and inconsistencies in reporting practices.
Abstract
This thesis focuses on the development and assessment of a comprehensive pipeline aimed at extracting Environmental, Social, and Governance (ESG) key performance
indicators (KPIs) from corporate websites and sustainability reports. The proposed pipeline integrates web discovery, a Large Language Model (LLM) for information extraction, and a deterministic post-processing mechanism to standardize noisy
outputs into structured, analysis-ready tables. A case study carried out at Illuminem is used to evaluate field-level accuracy, with an exploration of challenges stemming
from unstructured data formats, variability in model outputs, and inconsistencies in reporting practices.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Imanbayeva, Leilya
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
ESG,sustainability,reporting,data,extraction,large,language models,NLP,information,retrieval,automation,data,validation,hybrid, pipeline,corporate,disclosures
Data di discussione della Tesi
27 Ottobre 2025
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Imanbayeva, Leilya
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
ESG,sustainability,reporting,data,extraction,large,language models,NLP,information,retrieval,automation,data,validation,hybrid, pipeline,corporate,disclosures
Data di discussione della Tesi
27 Ottobre 2025
URI
Gestione del documento: