Cocchieri, Alessio
(2023)
Leveraging Large Language Model Distillation to Enhance Zero-Shot Named Entity Recognition and Classification.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
|
Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (3MB)
|
Abstract
Named entity recognition and classification (NERC) is a crucial task in natural language processing. Annotations help a lot in this task but, in the real world, annotations are chronically difficult to obtain and generalization to unseen types loom large. Embracing zero-shot learning becomes essential to surmount the absence of training examples. However, substantial prior knowledge is required to achieve remarkable outcomes, particularly in domain-specific scenarios. Although large language models (LLMs) hold great potential, computational cost and inefficiency severely hamper their applicability, favoring smaller specialized networks. In this paper, we propose JUICER, the first LLM distillation framework for zero-shot NERC in resource-constrained environments. Mechanically, JUICER transfers LLM knowledge to BERT-based models with a preliminary fine-tuning process centered on generative data augmentation above massive pre-training corpora. Generalizability is further promoted by injecting textual target class descriptions through cross-attention. We conduct extensive experiments on three zero-shot adapted BIO-format datasets. In this pursuit, we center our distillation process on biomedicine and assess adaptability to news and legal domains. Our knowledge-distilled models outperform state-of-the-art baselines across all benchmarks up to 0.27 macro-averaged F1 points, proving the benefit of numerous observed classes. Compared to zero-shot and reparametrized LLMs, we achieve superior overall results across all datasets using 510× fewer parameters. Interestingly, when trained in cross-domain setups, JUICER models experience a further increase of up to 0.07 points.
Abstract
Named entity recognition and classification (NERC) is a crucial task in natural language processing. Annotations help a lot in this task but, in the real world, annotations are chronically difficult to obtain and generalization to unseen types loom large. Embracing zero-shot learning becomes essential to surmount the absence of training examples. However, substantial prior knowledge is required to achieve remarkable outcomes, particularly in domain-specific scenarios. Although large language models (LLMs) hold great potential, computational cost and inefficiency severely hamper their applicability, favoring smaller specialized networks. In this paper, we propose JUICER, the first LLM distillation framework for zero-shot NERC in resource-constrained environments. Mechanically, JUICER transfers LLM knowledge to BERT-based models with a preliminary fine-tuning process centered on generative data augmentation above massive pre-training corpora. Generalizability is further promoted by injecting textual target class descriptions through cross-attention. We conduct extensive experiments on three zero-shot adapted BIO-format datasets. In this pursuit, we center our distillation process on biomedicine and assess adaptability to news and legal domains. Our knowledge-distilled models outperform state-of-the-art baselines across all benchmarks up to 0.27 macro-averaged F1 points, proving the benefit of numerous observed classes. Compared to zero-shot and reparametrized LLMs, we achieve superior overall results across all datasets using 510× fewer parameters. Interestingly, when trained in cross-domain setups, JUICER models experience a further increase of up to 0.07 points.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Cocchieri, Alessio
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Models,Named Entity Recognition,Natural Language Processing,Knowledge Distillation,Zero-shot Learning
Data di discussione della Tesi
21 Ottobre 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Cocchieri, Alessio
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Models,Named Entity Recognition,Natural Language Processing,Knowledge Distillation,Zero-shot Learning
Data di discussione della Tesi
21 Ottobre 2023
URI
Statistica sui download
Gestione del documento: