Developing an Automated ESG Data Extraction and Analysis Tool with NLP Techniques and Large Language Models

Pieroni, Francesco (2023) Developing an Automated ESG Data Extraction and Analysis Tool with NLP Techniques and Large Language Models. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)


This thesis proposes an AI-driven system to process corporate sustainability reports. The end-to-end pipeline employs natural language processing techniques and models to process and extract ESG data from PDF-based documents with efficiency and accuracy. The pipeline development entails several stages: keyword classification, fine-tuned ESG-BERT categorization, GPT-based extraction and Pix2Struct-based visualization extraction. Furthermore, the thesis investigates the optimal conditions and constraints of both GPT-based and Pix2Struct models during report analysis. The evaluation of the AI system's performance entails a thorough study of pre-extraction, keyword classification, ESG-BERT classification, and GPT extraction, along with a comparison between GPT-4 and GPT-3.5-turbo. To address challenges in information extraction and enhance user experience, the research presents a user-friendly verification station – a web application enabling users to navigate ESG reports, as well as verify, update, and modify the extracted data. This innovative interface reduces manual labour while elevating extraction process accuracy. This investigation highlights the AI system's efficiency and precision in mining critical ESG KPIs from corporate sustainability documents. It offers valuable insights and substantial contributions to natural language processing applications in ESG analysis, benefiting investors, ESG analysts, and consolidating literature in the field. By developing the ESG KPI extractor, the study empowers stakeholders with reliable and comprehensive data, fostering informed decisions and nurturing a sustainable and ethical corporate future.

Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Pieroni, Francesco
Relatore della tesi
Correlatore della tesi
Corso di studio
Ordinamento Cds
Parole chiave
Data di discussione della Tesi
20 Luglio 2023

Altri metadati

Gestione del documento: Visualizza il documento