Diagnose Your Text: A Plausibility Estimation Model for Medical Statements

Bedeschi, Federica (2024) Diagnose Your Text: A Plausibility Estimation Model for Medical Statements. [Laurea], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [L-DM270] - Cesena, Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text non accessibile fino al 1 Settembre 2025.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (2MB) | Contatta l'autore

Abstract

Today's large language models (LLMs) are extremely good at generating text practically indistinguishable from the human one, but are prone to hallucinate information. Therefore, there is a need for fact-checking, particularly in sensitive and critical domains where misinformation could have life-threatening consequences, as seen in the medical field. This task presents a significant challenge due to the exceptional quality of the artificial text. Moreover, developing highly effective fact-checking tools in specialized domains still needs to be solved in the literature. In this thesis, we introduce Med-vera, taking a step in automatically quantifying the plausibility of declarative medical statements. To better align with real-world scenarios, we focus on evidence-free fact-checking, which requires a statement as input without needing a trustworthy source paragraph to compare with. Using LLaMA-2-chat, Med-vera verbalizes existing heterogeneous medical resources to generate over 1 million correct and incorrect artificial statements, which serve as the basis for training closed-book classification models. The converted resources are question-answering datasets (BioASQ, MedMCQA, PubMedQA) and knowledge graphs (UMLS).

Abstract
Tipologia del documento
Tesi di laurea (Laurea)
Autore della tesi
Bedeschi, Federica
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Models,Natural Language Generation,Prompt Engineering,Fact-Checking,Medical Domain
Data di discussione della Tesi
15 Marzo 2024
URI

Altri metadati

Gestione del documento: Visualizza il documento

^