Optimizing Listwise Reranking with ModernBERT: A Data-Centric Approach with a Novel Graded Corpus

Maranzana, Mattia (2025) Optimizing Listwise Reranking with ModernBERT: A Data-Centric Approach with a Novel Graded Corpus. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (814kB)

Abstract

This thesis investigates how ModernBERT can be used as an effective cross-encoder reranker within contemporary information retrieval systems. To enable this, a graded-relevance dataset was constructed, offering richer and more expressive supervision than traditional binary relevance labels. This dataset allows the model to learn subtle distinctions in relevance and better capture the nuanced relationships between queries and documents. The reranker is evaluated in a zero-shot setting on a broad selection of BEIR datasets spanning multiple domains. Despite never being fine-tuned on these tasks, ModernBERT consistently strengthens the final rankings produced by the retrieval pipeline, demonstrating an ability to transfer its learned ranking behaviour beyond the supervised training distribution. The improvements are especially clear on datasets whose structure and content align naturally with the type of signals present in the constructed training data. Overall, the thesis shows that ModernBERT is a capable and adaptable reranking model, benefiting significantly from graded supervision and generalising well across diverse retrieval scenarios. These findings highlight the value of high-quality relevance annotations and point toward future work on expanding supervision sources and refining reranking architectures for even stronger performance.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Maranzana, Mattia
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
natural language processing, passage reranking, listwise reranking, ModernBERT, dataset creation
Data di discussione della Tesi
4 Dicembre 2025
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^