Comprehensive study of clinical entity extraction and classification using Large Language Models

Faedi, Michele (2023) Comprehensive study of clinical entity extraction and classification using Large Language Models. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (1MB)

Abstract

Clinical entities are terms used by specialist doctors to address specific biomedical concepts. Nowadays, NLP tasks have received a boost due to the development of large language models that can understand the semantics of a sentence and reason on it. A key feature of large language models is their ability to learn during training and apply this knowledge as needed, a capability crucial for biomedical natural language analysis. Entity extraction is a task that given an unstructured text, aims to locate and classify the concepts in order to use the retrieved information in a subsequent task. This is a well-known task in the literature known as Named Entity Recognition (NER). The state-of-the-art models perform very well when provided with enough data and when the entities are generic. We investigate the efficacy of various techniques for NER in the clinical domain, where the amount of available data to train models is limited. In this challenging domain, MAPS S.P.A. developed a rule-based pipeline that extracts concepts from unstructured text. This pipeline uses a 'suggester' to produce candidates that will be later filtered and processed to return desired concepts. The aim of this project is to study the accuracy of the 'suggester' in various environments to derive general conclusions that can be adapted to Italian clinical documents. Our investigation encompasses three distinct methods to tackle the problem: the EntityRecognizer, the SpanCategorizer, and various generative approaches.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Faedi, Michele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
nlp,biomedical nlp,LLM,Large Language Models,decoder-only,bert,BERT,BERT-like
Data di discussione della Tesi
16 Dicembre 2023
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^