Proposal for industry RAG evaluation: Generative Universal Evaluation of LLMs and Information retrieval

Gueli, Gianluca (2024) Proposal for industry RAG evaluation: Generative Universal Evaluation of LLMs and Information retrieval. [Laurea magistrale], Università di Bologna, Corso di Studio in Informatica [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (2MB) | Contatta l'autore

Abstract

This thesis reports my internship experience at Bitapp, a software development company located in Bologna. The primary focus of my work involved the design and implementation of a chatbot utilising Retrieval-Augmented Generation (RAG) for an e-learning platform. My main objective was to develop a virtual teacher capable of providing contextually relevant responses to user queries. However, the RAG system comprises a multitude of parameters, such as chunk size and embedding model, which may not be universally applicable across all use cases. Therefore, it is essential to conduct an evaluation of the RAG system in order to identify the most appropriate parameters. In particular, the creation of benchmarks presented a significant challenge, as existing academic benchmarks do not cover the private data involved. To address these issues, a novel approach for generating benchmarks was developed under critical condition to the business. This approach included the evaluation of the retrieval system and an analysis of the relationship between chunk size and embedding models, employing hit rate metrics for assessment. The results indicated that the optimal configuration for managing private business data consisted of a chunk size of 500, utilizing the paraphrase-multilingual-mpnet-based-v2 as the embedding model and gemma2 9b-instruct-q3_K_M as the language model. The work establishes a foundation upon which a framework capable of generating benchmarks on one's own private data may be constructed in the future.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Gueli, Gianluca
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Curriculum B: Informatica per il management
Ordinamento Cds
DM270
Parole chiave
RAG,RAG-eval,benchmarks creation,LLM,Embedding model,Chunk size,chatbot,artificial intelligence
Data di discussione della Tesi
19 Dicembre 2024
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^