Bridging the Lexical-Semantic Gap in Music Discovery: The Reranker as a Foundational Enabler for Hybrid Agentic RAG Architectures

Merola, Carlo (2026) Bridging the Lexical-Semantic Gap in Music Discovery: The Reranker as a Foundational Enabler for Hybrid Agentic RAG Architectures. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text non accessibile fino al 5 Maggio 2028.
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (734kB) | Contatta l'autore

Abstract

This thesis addresses a key challenge in enterprise Retrieval-Augmented Generation: how to combine the scalability and robustness of lexical search with the semantic understanding required by natural-language interaction, without the infrastructural cost of full dense retrieval. The work is conducted in collaboration with Musixmatch on a large-scale music catalog. The main contribution is the design and validation of a hybrid lexical–semantic retrieval architecture in which semantic reranking is reframed not as a simple post-processing improvement, but as the core component that bridges the gap between lexical retrieval and user intent. The system combines query translation, lexical candidate generation, dense-signal document construction, semantic reranking, and offline semantic enrichment modules such as Named Entity Recognition. The thesis also introduces a scalable evaluation methodology based on synthetic queries over real enterprise documents with LLM-based relevance annotation. Experimental results show that semantic reranking significantly improves retrieval quality over both lexical and dense baselines. In particular, lightweight pointwise rerankers retain strong effectiveness with low latency and zero variable cost, while avoiding the aggregate context-window constraints of listwise reranking. The results further show that document construction is a critical architectural factor: structured semantic representations improve reranking and, in some downstream RAG settings, outperform configurations including full lyrical content. Reranked retrieval more than doubles the proportion of retrieved documents effectively used during answer generation. Overall, the thesis proposes a practical, modular blueprint for enterprise-grade AI search systems, showing that semantic capability can be added to existing lexical infrastructures through reranking, while preserving scalability, improving effectiveness, and enabling zero-migration semantic modernization.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Merola, Carlo

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Mancini, Eleonora ; Tavella, Maria Stella

Scuola

Ingegneria e Architettura

Corso di studio

Artificial intelligence [LM-DM270]

Ordinamento Cds

DM270

Parole chiave

Retrieval-Augmented Generation, enterprise search, semantic reranking, lexical retrieval, hybrid retrieval, cross-encoder, listwise reranking, pointwise reranking, query translation, named entity recognition, synthetic data, LLM annotation, music discovery, modular architectures, agentic RAG, agents, LLM-as-a-judge, RAG, transformers, in-context-learning, attention, splade, dense retrieval, vector database, ANN, Large Languange Models, LLM-as-a-judg, relevance annotation, enterprise AI, music discover

Data di discussione della Tesi

26 Marzo 2026

URI

https://amslaurea.unibo.it/id/eprint/38638

Altri metadati

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

Bridging the Lexical-Semantic Gap in Music Discovery: The Reranker as a Foundational Enabler for Hybrid Agentic RAG Architectures

Abstract

Altri metadati