Comprehensive study of clinical entity extraction and classification using Large Language Models

Faedi, Michele (2023) Comprehensive study of clinical entity extraction and classification using Large Language Models. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (1MB)

Abstract

Clinical entities are terms used by specialist doctors to address specific biomedical concepts. Nowadays, NLP tasks have received a boost due to the development of large language models that can understand the semantics of a sentence and reason on it. A key feature of large language models is their ability to learn during training and apply this knowledge as needed, a capability crucial for biomedical natural language analysis. Entity extraction is a task that given an unstructured text, aims to locate and classify the concepts in order to use the retrieved information in a subsequent task. This is a well-known task in the literature known as Named Entity Recognition (NER). The state-of-the-art models perform very well when provided with enough data and when the entities are generic. We investigate the efficacy of various techniques for NER in the clinical domain, where the amount of available data to train models is limited. In this challenging domain, MAPS S.P.A. developed a rule-based pipeline that extracts concepts from unstructured text. This pipeline uses a 'suggester' to produce candidates that will be later filtered and processed to return desired concepts. The aim of this project is to study the accuracy of the 'suggester' in various environments to derive general conclusions that can be adapted to Italian clinical documents. Our investigation encompasses three distinct methods to tackle the problem: the EntityRecognizer, the SpanCategorizer, and various generative approaches.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Faedi, Michele

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Galassi, Andrea ; Grundler, Giulia ; Emiliani, Vieri

Scuola

Ingegneria e Architettura

Corso di studio