Bridging the Lexical-Semantic Gap in Music Discovery: The Reranker as a Foundational Enabler for Hybrid Agentic RAG Architectures

Merola, Carlo (2026) Bridging the Lexical-Semantic Gap in Music Discovery: The Reranker as a Foundational Enabler for Hybrid Agentic RAG Architectures. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text non accessibile fino al 5 Maggio 2028.
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (734kB) | Contatta l'autore

Abstract

This thesis addresses a key challenge in enterprise Retrieval-Augmented Generation: how to combine the scalability and robustness of lexical search with the semantic understanding required by natural-language interaction, without the infrastructural cost of full dense retrieval. The work is conducted in collaboration with Musixmatch on a large-scale music catalog. The main contribution is the design and validation of a hybrid lexical–semantic retrieval architecture in which semantic reranking is reframed not as a simple post-processing improvement, but as the core component that bridges the gap between lexical retrieval and user intent. The system combines query translation, lexical candidate generation, dense-signal document construction, semantic reranking, and offline semantic enrichment modules such as Named Entity Recognition. The thesis also introduces a scalable evaluation methodology based on synthetic queries over real enterprise documents with LLM-based relevance annotation. Experimental results show that semantic reranking significantly improves retrieval quality over both lexical and dense baselines. In particular, lightweight pointwise rerankers retain strong effectiveness with low latency and zero variable cost, while avoiding the aggregate context-window constraints of listwise reranking. The results further show that document construction is a critical architectural factor: structured semantic representations improve reranking and, in some downstream RAG settings, outperform configurations including full lyrical content. Reranked retrieval more than doubles the proportion of retrieved documents effectively used during answer generation. Overall, the thesis proposes a practical, modular blueprint for enterprise-grade AI search systems, showing that semantic capability can be added to existing lexical infrastructures through reranking, while preserving scalability, improving effectiveness, and enabling zero-migration semantic modernization.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Merola, Carlo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Retrieval-Augmented Generation, enterprise search, semantic reranking, lexical retrieval, hybrid retrieval, cross-encoder, listwise reranking, pointwise reranking, query translation, named entity recognition, synthetic data, LLM annotation, music discovery, modular architectures, agentic RAG, agents, LLM-as-a-judge, RAG, transformers, in-context-learning, attention, splade, dense retrieval, vector database, ANN, Large Languange Models, LLM-as-a-judg, relevance annotation, enterprise AI, music discover
Data di discussione della Tesi
26 Marzo 2026
URI

Altri metadati

Gestione del documento: Visualizza il documento

^