Proposal for industry RAG evaluation: Generative Universal Evaluation of LLMs and Information retrieval

Gueli, Gianluca (2024) Proposal for industry RAG evaluation: Generative Universal Evaluation of LLMs and Information retrieval. [Laurea magistrale], Università di Bologna, Corso di Studio in Informatica [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (2MB) | Contatta l'autore

Abstract

This thesis reports my internship experience at Bitapp, a software development company located in Bologna. The primary focus of my work involved the design and implementation of a chatbot utilising Retrieval-Augmented Generation (RAG) for an e-learning platform. My main objective was to develop a virtual teacher capable of providing contextually relevant responses to user queries. However, the RAG system comprises a multitude of parameters, such as chunk size and embedding model, which may not be universally applicable across all use cases. Therefore, it is essential to conduct an evaluation of the RAG system in order to identify the most appropriate parameters. In particular, the creation of benchmarks presented a significant challenge, as existing academic benchmarks do not cover the private data involved. To address these issues, a novel approach for generating benchmarks was developed under critical condition to the business. This approach included the evaluation of the retrieval system and an analysis of the relationship between chunk size and embedding models, employing hit rate metrics for assessment. The results indicated that the optimal configuration for managing private business data consisted of a chunk size of 500, utilizing the paraphrase-multilingual-mpnet-based-v2 as the embedding model and gemma2 9b-instruct-q3_K_M as the language model. The work establishes a foundation upon which a framework capable of generating benchmarks on one's own private data may be constructed in the future.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Gueli, Gianluca

Relatore della tesi

Tamburini, Fabio

Correlatore della tesi

Sensidoni, Simone

Scuola

Scienze

Corso di studio