STRUMENTI DI NAVIGAZIONE

Leveraging Large Language Model Distillation to Enhance Zero-Shot Named Entity Recognition and Classification

Cocchieri, Alessio (2023) Leveraging Large Language Model Distillation to Enhance Zero-Shot Named Entity Recognition and Classification. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text non accessibile fino al 2 Settembre 2024.
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (3MB) | Contatta l'autore

Abstract

Named entity recognition and classification (NERC) is a crucial task in natural language processing. Annotations help a lot in this task but, in the real world, annotations are chronically difficult to obtain and generalization to unseen types loom large. Embracing zero-shot learning becomes essential to surmount the absence of training examples. However, substantial prior knowledge is required to achieve remarkable outcomes, particularly in domain-specific scenarios. Although large language models (LLMs) hold great potential, computational cost and inefficiency severely hamper their applicability, favoring smaller specialized networks. In this paper, we propose JUICER, the first LLM distillation framework for zero-shot NERC in resource-constrained environments. Mechanically, JUICER transfers LLM knowledge to BERT-based models with a preliminary fine-tuning process centered on generative data augmentation above massive pre-training corpora. Generalizability is further promoted by injecting textual target class descriptions through cross-attention. We conduct extensive experiments on three zero-shot adapted BIO-format datasets. In this pursuit, we center our distillation process on biomedicine and assess adaptability to news and legal domains. Our knowledge-distilled models outperform state-of-the-art baselines across all benchmarks up to 0.27 macro-averaged F1 points, proving the benefit of numerous observed classes. Compared to zero-shot and reparametrized LLMs, we achieve superior overall results across all datasets using 510× fewer parameters. Interestingly, when trained in cross-domain setups, JUICER models experience a further increase of up to 0.07 points.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Cocchieri, Alessio

Relatore della tesi

Sartori, Claudio

Correlatore della tesi

Moro, Gianluca ; Frisoni, Giacomo ; Martínez Galindo, Marcos

Scuola

Ingegneria e Architettura

Corso di studio