Faedi, Michele
 
(2023)
Comprehensive study of clinical entity extraction and classification using Large Language Models.
[Laurea magistrale], Università di Bologna, Corso di Studio in 
Artificial intelligence [LM-DM270]
   
  
  
        
        
	
  
  
  
  
  
  
  
    
  
    
      Documenti full-text disponibili:
      
        
          
            | ![[thumbnail of Thesis]](https://amslaurea.unibo.it/style/images/fileicons/application_pdf.png) | Documento PDF (Thesis) Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
 Download (1MB)
 | 
        
      
    
  
  
    
      Abstract
      Clinical entities are terms used by specialist doctors to address specific biomedical concepts.
Nowadays, NLP tasks have received a boost due to the development of large language models that can understand the semantics of a sentence and reason on it. A key feature of large language models is their ability to learn during training and apply this knowledge as needed, a capability crucial for biomedical natural language analysis.
Entity extraction is a task that given an unstructured text, aims to locate and classify the concepts in order to use the retrieved information in a subsequent task. This is a well-known task in the literature known as Named Entity Recognition (NER). The state-of-the-art models perform very well when provided with enough data and when the entities are generic. 
We investigate the efficacy of various techniques for NER in the clinical domain, where the amount of available data to train models is limited. In this challenging domain, MAPS S.P.A. developed a rule-based pipeline that extracts concepts from unstructured text. This pipeline uses a 'suggester' to produce candidates that will be later filtered and processed to return desired concepts.
The aim of this project is to study the accuracy of the 'suggester' in various environments to derive general conclusions that can be adapted to Italian clinical documents.
Our investigation encompasses three distinct methods to tackle the problem: the EntityRecognizer, the SpanCategorizer, and various generative approaches.
     
    
      Abstract
      Clinical entities are terms used by specialist doctors to address specific biomedical concepts.
Nowadays, NLP tasks have received a boost due to the development of large language models that can understand the semantics of a sentence and reason on it. A key feature of large language models is their ability to learn during training and apply this knowledge as needed, a capability crucial for biomedical natural language analysis.
Entity extraction is a task that given an unstructured text, aims to locate and classify the concepts in order to use the retrieved information in a subsequent task. This is a well-known task in the literature known as Named Entity Recognition (NER). The state-of-the-art models perform very well when provided with enough data and when the entities are generic. 
We investigate the efficacy of various techniques for NER in the clinical domain, where the amount of available data to train models is limited. In this challenging domain, MAPS S.P.A. developed a rule-based pipeline that extracts concepts from unstructured text. This pipeline uses a 'suggester' to produce candidates that will be later filtered and processed to return desired concepts.
The aim of this project is to study the accuracy of the 'suggester' in various environments to derive general conclusions that can be adapted to Italian clinical documents.
Our investigation encompasses three distinct methods to tackle the problem: the EntityRecognizer, the SpanCategorizer, and various generative approaches.
     
  
  
    
    
      Tipologia del documento
      Tesi di laurea
(Laurea magistrale)
      
      
      
      
        
      
        
          Autore della tesi
          Faedi, Michele
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          nlp,biomedical nlp,LLM,Large Language Models,decoder-only,bert,BERT,BERT-like
          
        
      
        
          Data di discussione della Tesi
          16 Dicembre 2023
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di laurea
(NON SPECIFICATO)
      
      
      
      
        
      
        
          Autore della tesi
          Faedi, Michele
          
        
      
        
          Relatore della tesi
          
          
        
      
        
          Correlatore della tesi
          
          
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          nlp,biomedical nlp,LLM,Large Language Models,decoder-only,bert,BERT,BERT-like
          
        
      
        
          Data di discussione della Tesi
          16 Dicembre 2023
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: 
      
        