VerifAi: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers

Cassano, Lorenzo (2024) VerifAi: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (639kB)

Abstract

This research investigates the effectiveness of transformer-based models in mitigating hallucinations within the biomedical domain, a crucial area in natural language processing (NLP). Hallucinations occur when language models generate unsupported or divergent information. Despite their capabilities, large language models (LLMs) are prone to such errors, impacting critical sectors like biomedicine. The study has two main objectives: exploring methods like Retrieval-Augmented Generation (RAG) and Retrieval-Augmented Fine- Tuning (RAFT) to reduce hallucinations, and developing techniques for detecting persistent hallucinations. These efforts aim to limit and identify hallucinations in transformer-based models. Additionally, the research introduces a biomedical RAG system to enhance response reliability, using fine-tuned LLMs with PubMed abstracts. This system outperforms the PubMed search engine and GPT-4 Turbo in referencing relevant abstracts. The study also presents a Verification Engine for an open-source scientific QA system, using models fine-tuned on the SciFact dataset. The DeBERTa model achieved an F1 score of 88%, outperforming other models on the HealthVer dataset. These findings advance NLP techniques, particularly in biomedicine, by improving the accuracy and reliability of transformer-based models.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Cassano, Lorenzo

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Milosevic, Nikola

Scuola

Ingegneria e Architettura

Corso di studio