Development of a Retrieval-Enhanced Chatbot for the Italian Workplace Safety Regulation in Low-Resource Settings

Fu, Weijie (2025) Development of a Retrieval-Enhanced Chatbot for the Italian Workplace Safety Regulation in Low-Resource Settings. [Laurea], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [L-DM270] - Cesena, Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Hallucination remains a major challenge when applying large pretrained language models to specialized domains such as legal or regulatory text. In contexts like Italian workplace safety regulations, unreliable or unsupported responses can lead to serious consequences. To address this issue while maintaining efficiency in low-resource environments, this thesis presents the design, implementation, and evaluation of a retrieval-enhanced chatbot tailored to Italian workplace safety regulations, optimized for low-resource deployment scenarios. The proposed system couples a compact pretrained language model with a document retriever so that generated responses are grounded in official regulatory texts and can include explicit citations. To validate the idea, we compare the chatbot's performance with and without the retriever using multiple-choice question benchmarks derived from workplace safety regulation materials. The results indicate that, while language models contain some latent knowledge of the domain, the retrieval-augmented configuration consistently yields higher accuracy and more reliability. We also attempt to fine-tune the retriever to improve recall; however, the off-the-shelf retriever is already strong, and fine-tuning produces negligible gains. These findings confirm that retrieval augmentation can substantially improve the reliability and accuracy of language models in specialized, low-resource regulatory domains.

Abstract
Tipologia del documento
Tesi di laurea (Laurea)
Autore della tesi
Fu, Weijie
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Models,Retrieval Augmented Generation,Efficient Artificial Intelligence,Vector Database,Knowledge-Enhanced Chatbot
Data di discussione della Tesi
27 Novembre 2025
URI

Altri metadati

Gestione del documento: Visualizza il documento

^