Design, Implementation and Benchmarking of a Retrieval-Augmented Chatbot for the Insurance Sector

Bosi, Tancredi (2025) Design, Implementation and Benchmarking of a Retrieval-Augmented Chatbot for the Insurance Sector. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (1MB)

Abstract

The insurance sector presents unique challenges for information access, where professionals must navigate extensive collections of complex policy documents across multiple sectors and companies. Traditional manual consultation methods create significant productivity bottlenecks that limit analytical capabilities and service efficiency. This thesis presents the design, implementation, and evaluation of a Retrieval-Augmented Generation (RAG) system specifically developed for insurance domain applications. The system leverages OpenAI's language models and file search capabilities to enable insurance brokers to access relevant policy information through natural language queries with precise source attribution. The research addresses both practical commercial development and fundamental questions about RAG system performance in specialized professional domains. The evaluation methodology employs a multi-dimensional framework combining automated metrics with expert assessment across 30 benchmark questions developed with experienced insurance professionals. The benchmark captures authentic broker queries spanning multiple insurance sectors with precise document-page citations essential for professional verification. Results demonstrate exceptional performance with 95.0% retrieval recall and expert evaluations revealing 93.3% of responses achieve scores of 4 or higher on a five-point professional utility scale. The analysis reveals important limitations of standard automated metrics in specialized domains, where lexical similarity measures significantly underestimate response quality due to appropriate domain-specific paraphrasing. Following successful evaluation, the system has been deployed as a production web application currently serving insurance professionals, enabling rapid access to complex policy information in seconds rather than through time-intensive manual processes.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Bosi, Tancredi

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Vita, Marco

Scuola

Ingegneria e Architettura

Corso di studio