Bollino, Emmanuele
(2023)
Automatic Terminology Coding for the Biomedical Domain.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
Abstract
The biomedical sector, rich in unstructured data from sources like clinical notes and health records, presents a prime opportunity for Natural Language Processing (NLP) applications. Especially pivotal is the task of entity linking, wherein textual mentions are mapped to medical concepts within a knowledge base, in this case, represented by the Unified Medical Language System (UMLS) Metathesaurus. Within this realm, the Italian language faces resource constraints (only 4% of UMLS 4M concepts have a label in the Italian language). Current systems like MAPS Group’s Clinika software lean on label matching to link the extracted facts to the corresponding UMLS concepts. This dissertation deals with the design of a new Clinika component aimed at enhancing entity linking for Italian terms against UMLS, even in the absence of direct Italian labels. Employing transformer-based multilingual embeddings, a novel 'concept guesser' architecture was developed to tackle the linking challenge intelligently, maximizing the level of exploitation of the currently available knowledge. This innovation not only enhances Clinika’s effectiveness but also paves the way for advanced multilingual clinical decision support systems.
Abstract
The biomedical sector, rich in unstructured data from sources like clinical notes and health records, presents a prime opportunity for Natural Language Processing (NLP) applications. Especially pivotal is the task of entity linking, wherein textual mentions are mapped to medical concepts within a knowledge base, in this case, represented by the Unified Medical Language System (UMLS) Metathesaurus. Within this realm, the Italian language faces resource constraints (only 4% of UMLS 4M concepts have a label in the Italian language). Current systems like MAPS Group’s Clinika software lean on label matching to link the extracted facts to the corresponding UMLS concepts. This dissertation deals with the design of a new Clinika component aimed at enhancing entity linking for Italian terms against UMLS, even in the absence of direct Italian labels. Employing transformer-based multilingual embeddings, a novel 'concept guesser' architecture was developed to tackle the linking challenge intelligently, maximizing the level of exploitation of the currently available knowledge. This innovation not only enhances Clinika’s effectiveness but also paves the way for advanced multilingual clinical decision support systems.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Bollino, Emmanuele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
NLP,Embedding,Ranking,UMLS,Biomedicine,BERT,Graph KB,Multilingual,Entity Linking
Data di discussione della Tesi
21 Ottobre 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Bollino, Emmanuele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
NLP,Embedding,Ranking,UMLS,Biomedicine,BERT,Graph KB,Multilingual,Entity Linking
Data di discussione della Tesi
21 Ottobre 2023
URI
Statistica sui download
Gestione del documento: