Automatic Terminology Coding for the Biomedical Domain

Bollino, Emmanuele (2023) Automatic Terminology Coding for the Biomedical Domain. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (1MB)


The biomedical sector, rich in unstructured data from sources like clinical notes and health records, presents a prime opportunity for Natural Language Processing (NLP) applications. Especially pivotal is the task of entity linking, wherein textual mentions are mapped to medical concepts within a knowledge base, in this case, represented by the Unified Medical Language System (UMLS) Metathesaurus. Within this realm, the Italian language faces resource constraints (only 4% of UMLS 4M concepts have a label in the Italian language). Current systems like MAPS Group’s Clinika software lean on label matching to link the extracted facts to the corresponding UMLS concepts. This dissertation deals with the design of a new Clinika component aimed at enhancing entity linking for Italian terms against UMLS, even in the absence of direct Italian labels. Employing transformer-based multilingual embeddings, a novel 'concept guesser' architecture was developed to tackle the linking challenge intelligently, maximizing the level of exploitation of the currently available knowledge. This innovation not only enhances Clinika’s effectiveness but also paves the way for advanced multilingual clinical decision support systems.

Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Bollino, Emmanuele
Relatore della tesi
Correlatore della tesi
Corso di studio
Ordinamento Cds
Parole chiave
NLP,Embedding,Ranking,UMLS,Biomedicine,BERT,Graph KB,Multilingual,Entity Linking
Data di discussione della Tesi
21 Ottobre 2023

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento