Design, Implementation and Benchmarking of a Retrieval-Augmented Chatbot for the Insurance Sector

Bosi, Tancredi (2025) Design, Implementation and Benchmarking of a Retrieval-Augmented Chatbot for the Insurance Sector. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (1MB)

Abstract

The insurance sector presents unique challenges for information access, where professionals must navigate extensive collections of complex policy documents across multiple sectors and companies. Traditional manual consultation methods create significant productivity bottlenecks that limit analytical capabilities and service efficiency. This thesis presents the design, implementation, and evaluation of a Retrieval-Augmented Generation (RAG) system specifically developed for insurance domain applications. The system leverages OpenAI's language models and file search capabilities to enable insurance brokers to access relevant policy information through natural language queries with precise source attribution. The research addresses both practical commercial development and fundamental questions about RAG system performance in specialized professional domains. The evaluation methodology employs a multi-dimensional framework combining automated metrics with expert assessment across 30 benchmark questions developed with experienced insurance professionals. The benchmark captures authentic broker queries spanning multiple insurance sectors with precise document-page citations essential for professional verification. Results demonstrate exceptional performance with 95.0% retrieval recall and expert evaluations revealing 93.3% of responses achieve scores of 4 or higher on a five-point professional utility scale. The analysis reveals important limitations of standard automated metrics in specialized domains, where lexical similarity measures significantly underestimate response quality due to appropriate domain-specific paraphrasing. Following successful evaluation, the system has been deployed as a production web application currently serving insurance professionals, enabling rapid access to complex policy information in seconds rather than through time-intensive manual processes.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Bosi, Tancredi
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
RAG, Chatbot, Insurance Domain, Benchmark Construction, Evaluation of Responses
Data di discussione della Tesi
7 Ottobre 2025
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^