Design and implementation of a real-world search engine based on Okapi BM25 and SentenceBERT

Bonetti, Lorenzo (2021) Design and implementation of a real-world search engine based on Okapi BM25 and SentenceBERT. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (2MB)

Abstract

The work conducted in this thesis aims to present an hybrid model for a real­ world application search engine. The project presented was part of an intern­ship work carried out in a start­up which deals with Knowledge Management and Artificial Intelligence. The aim of the internship work was to improve the current search engine system to build a new system for a future web ap­plication use case. An in­-depth study on the limitations of keyword search alone, and on semantic search, revealed the need of a transition from a pure keyword­-based information retrieval system to an hybrid model, making use of both keyword search and semantic search. In particular the old system re­lied on a tfidf­-based algorithm, while the final model tries to overcome the limits of keyword search by joining the abilities of OkapiBM25, a probabilis­tic information retrieval approach, with newer semantic search models based on SentenceBERT. The models, and the algorithm implemented, exploit deeply recent techniques in Information Retrieval such as lexical search, sim­ilarity search, query expansion, document expansion and automatic question generation. The data used to test the models came from a banking dataset, be­longing to one of the company clients, previously created for an Information Retrieval chat­bot. Different experiments led to a final model able to improve the search performances showing great advantages with respect to keyword search and pure semantic search.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Bonetti, Lorenzo
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
OkapiBM25,SentenceBERT,Keyword search,Semantic search,Question generation,Information Retrieval,document expansion
Data di discussione della Tesi
3 Dicembre 2021
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^