Building and Evaluating a Document-Grounded QA System: A RAG-Based Approach

De Faveri, Alessandro (2026) Building and Evaluating a Document-Grounded QA System: A RAG-Based Approach. [Laurea magistrale], Università di Bologna, Corso di Studio in Digital transformation management [LM-DM270] - Cesena

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (528kB)

Abstract

Large Language Models are increasingly used to support study and research tasks, yet their adoption in academic settings is constrained by the need for verifiability and traceability. In particular, when answering questions about scientific papers, responses must be grounded in primary sources and should enable readers to locate the supporting evidence, ideally at page level. This thesis addresses the problem of question answering over a local corpus of faculty-authored papers that are not reliably covered by an LLM’s training data, aiming to improve correctness while providing citations that support rapid validation. To this end, the thesis designs and implements an end-to-end RetrievalAugmented Generation (RAG) system for scientific PDFs. The pipeline performs page-level PDF text extraction, punctuation-aware chunking with overlap (chunk size 1000, overlap 200), embedding computation using all-MiniLM-L6-v2, and indexing in Qdrant with provenance metadata (source, page). At query time, the system retrieves the top-k most similar chunks (TOP K=5), constructs a prompt that injects retrieved evidence, and generates an answer using either local LLM backends via Ollama or an optional cloud backend. Prompt engineering is treated as a first-class design variable through five prompt templates and an optional open-knowledge mode. The evaluation is conducted on an internal benchmark of 10 questions with reference answers and expected provenance. Results show that prompt design significantly affects answer alignment: the strict template (T4) achieves the highest average similarity across backends, outperforming more permissive templates. Overall, the work demonstrates that combining dense retrieval, provenance-aware indexing, and carefully designed prompts can improve controllability and traceability for academic paper question answering, and it outlines future directions focused on prompt optimisation and decomposition-driven prompting for complex multi-document queries.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

De Faveri, Alessandro

Relatore della tesi

Francia, Matteo

Scuola

Ingegneria e Architettura

Corso di studio

Digital transformation management [LM-DM270] - Cesena

Ordinamento Cds