Bio-QA-GNN: Reasoning with Language Models and Knowledge Graphs for Interpretable Biomedical Question Answering

Gnagnarella, Enrico (2022) Bio-QA-GNN: Reasoning with Language Models and Knowledge Graphs for Interpretable Biomedical Question Answering. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena, Documento ad accesso riservato.
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (3MB) | Contatta l'autore

Abstract

Nowadays the idea of injecting world or domain-specific structured knowledge into pre-trained language models (PLMs) is becoming an increasingly popular approach for solving problems such as biases, hallucinations, huge architectural sizes, and explainability lack—critical for real-world natural language processing applications in sensitive fields like bioinformatics. One recent work that has garnered much attention in Neuro-symbolic AI is QA-GNN, an end-to-end model for multiple-choice open-domain question answering (MCOQA) tasks via interpretable text-graph reasoning. Unlike previous publications, QA-GNN mutually informs PLMs and graph neural networks (GNNs) on top of relevant facts retrieved from knowledge graphs (KGs). However, taking a more holistic view, existing PLM+KG contributions mainly consider commonsense benchmarks and ignore or shallowly analyze performances on biomedical datasets. This thesis start from a propose of a deep investigation of QA-GNN for biomedicine, comparing existing or brand-new PLMs, KGs, edge-aware GNNs, preprocessing techniques, and initialization strategies. By combining the insights emerged in DISI's research, we introduce Bio-QA-GNN that include a KG. Working with this part has led to an improvement in state-of-the-art of MCOQA model on biomedical/clinical text, largely outperforming the original one (+3.63\% accuracy on MedQA). Our findings also contribute to a better understanding of the explanation degree allowed by joint text-graph reasoning architectures and their effectiveness on different medical subjects and reasoning types. Codes, models, datasets, and demos to reproduce the results are freely available at: \url{https://github.com/disi-unibo-nlp/bio-qagnn}.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Gnagnarella, Enrico
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Natural Language Understanding,Knowledge Graph,Subgraph Retrieval,Graph Neural Networks,Language Modeling
Data di discussione della Tesi
15 Dicembre 2022
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^