Faedi, Michele
(2023)
Comprehensive study of clinical entity extraction and classification using Large Language Models.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
|
Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (1MB)
|
Abstract
Clinical entities are terms used by specialist doctors to address specific biomedical concepts.
Nowadays, NLP tasks have received a boost due to the development of large language models that can understand the semantics of a sentence and reason on it. A key feature of large language models is their ability to learn during training and apply this knowledge as needed, a capability crucial for biomedical natural language analysis.
Entity extraction is a task that given an unstructured text, aims to locate and classify the concepts in order to use the retrieved information in a subsequent task. This is a well-known task in the literature known as Named Entity Recognition (NER). The state-of-the-art models perform very well when provided with enough data and when the entities are generic.
We investigate the efficacy of various techniques for NER in the clinical domain, where the amount of available data to train models is limited. In this challenging domain, MAPS S.P.A. developed a rule-based pipeline that extracts concepts from unstructured text. This pipeline uses a 'suggester' to produce candidates that will be later filtered and processed to return desired concepts.
The aim of this project is to study the accuracy of the 'suggester' in various environments to derive general conclusions that can be adapted to Italian clinical documents.
Our investigation encompasses three distinct methods to tackle the problem: the EntityRecognizer, the SpanCategorizer, and various generative approaches.
Abstract
Clinical entities are terms used by specialist doctors to address specific biomedical concepts.
Nowadays, NLP tasks have received a boost due to the development of large language models that can understand the semantics of a sentence and reason on it. A key feature of large language models is their ability to learn during training and apply this knowledge as needed, a capability crucial for biomedical natural language analysis.
Entity extraction is a task that given an unstructured text, aims to locate and classify the concepts in order to use the retrieved information in a subsequent task. This is a well-known task in the literature known as Named Entity Recognition (NER). The state-of-the-art models perform very well when provided with enough data and when the entities are generic.
We investigate the efficacy of various techniques for NER in the clinical domain, where the amount of available data to train models is limited. In this challenging domain, MAPS S.P.A. developed a rule-based pipeline that extracts concepts from unstructured text. This pipeline uses a 'suggester' to produce candidates that will be later filtered and processed to return desired concepts.
The aim of this project is to study the accuracy of the 'suggester' in various environments to derive general conclusions that can be adapted to Italian clinical documents.
Our investigation encompasses three distinct methods to tackle the problem: the EntityRecognizer, the SpanCategorizer, and various generative approaches.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Faedi, Michele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
nlp,biomedical nlp,LLM,Large Language Models,decoder-only,bert,BERT,BERT-like
Data di discussione della Tesi
16 Dicembre 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Faedi, Michele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
nlp,biomedical nlp,LLM,Large Language Models,decoder-only,bert,BERT,BERT-like
Data di discussione della Tesi
16 Dicembre 2023
URI
Statistica sui download
Gestione del documento: