Bonetti, Lorenzo
 
(2021)
Design and implementation of a real-world search engine based on Okapi BM25 and SentenceBERT.
[Laurea magistrale], Università di Bologna, Corso di Studio in 
Artificial intelligence [LM-DM270]
   
  
  
        
        
	
  
  
  
  
  
  
  
    
  
    
      Documenti full-text disponibili:
      
        
          
            | ![[thumbnail of Thesis]](https://amslaurea.unibo.it/style/images/fileicons/application_pdf.png) | Documento PDF (Thesis) Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
 Download (2MB)
 | 
        
      
    
  
  
    
      Abstract
      The work conducted in this thesis aims to present an hybrid model for a real world application search engine. The project presented was part of an internship work carried out in a startup which deals with Knowledge Management and Artificial Intelligence. The aim of the internship work was to improve the current search engine system to build a new system for a future web application use case. An in-depth study on the limitations of keyword search alone, and on semantic search, revealed the need of a transition from a pure keyword-based information retrieval system to an hybrid model, making use of both keyword search and semantic search. In particular the old system relied on a tfidf-based algorithm, while the final model tries to overcome the limits of keyword search by joining the abilities of OkapiBM25, a probabilistic information retrieval approach, with newer semantic search models based on SentenceBERT. The models, and the algorithm implemented, exploit deeply recent techniques in Information Retrieval such as lexical search, similarity search, query expansion, document expansion and automatic question generation. The data used to test the models came from a banking dataset, belonging to one of the company clients, previously created for an Information Retrieval chatbot. Different experiments led to a final model able to improve the search performances showing great advantages with respect to keyword search and pure semantic search.
     
    
      Abstract
      The work conducted in this thesis aims to present an hybrid model for a real world application search engine. The project presented was part of an internship work carried out in a startup which deals with Knowledge Management and Artificial Intelligence. The aim of the internship work was to improve the current search engine system to build a new system for a future web application use case. An in-depth study on the limitations of keyword search alone, and on semantic search, revealed the need of a transition from a pure keyword-based information retrieval system to an hybrid model, making use of both keyword search and semantic search. In particular the old system relied on a tfidf-based algorithm, while the final model tries to overcome the limits of keyword search by joining the abilities of OkapiBM25, a probabilistic information retrieval approach, with newer semantic search models based on SentenceBERT. The models, and the algorithm implemented, exploit deeply recent techniques in Information Retrieval such as lexical search, similarity search, query expansion, document expansion and automatic question generation. The data used to test the models came from a banking dataset, belonging to one of the company clients, previously created for an Information Retrieval chatbot. Different experiments led to a final model able to improve the search performances showing great advantages with respect to keyword search and pure semantic search.
     
  
  
    
    
      Tipologia del documento
      Tesi di laurea
(Laurea magistrale)
      
      
      
      
        
      
        
          Autore della tesi
          Bonetti, Lorenzo
          
        
      
        
          Relatore della tesi
          
          
        
      
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          OkapiBM25,SentenceBERT,Keyword search,Semantic search,Question generation,Information Retrieval,document expansion
          
        
      
        
          Data di discussione della Tesi
          3 Dicembre 2021
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di laurea
(NON SPECIFICATO)
      
      
      
      
        
      
        
          Autore della tesi
          Bonetti, Lorenzo
          
        
      
        
          Relatore della tesi
          
          
        
      
        
      
        
          Scuola
          
          
        
      
        
          Corso di studio
          
          
        
      
        
      
        
      
        
          Ordinamento Cds
          DM270
          
        
      
        
          Parole chiave
          OkapiBM25,SentenceBERT,Keyword search,Semantic search,Question generation,Information Retrieval,document expansion
          
        
      
        
          Data di discussione della Tesi
          3 Dicembre 2021
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: 
      
        