Using semantic entities to improve the distillation of transformers

Cozzi, Riccardo (2022) Using semantic entities to improve the distillation of transformers. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

In the last decade, the size of deep neural architectures implied in Natural Language Processing (NLP) has increased exponentially, reaching in some cases with hundreds of billions of parameters. Although, training and deploying these huge architectures is an extremely resource-demanding process and the costs are often not affordable in real-world applications. For these reasons, lots of research and industrial efforts are investigating solutions to reduce the size of these models but at the same time maintain high performance. This work was about studying and experimenting Knowledge Distillation techniques with the goal of training smaller and cheaper models while attempting to produce a good approximation of large pre-trained ones. The conducted experiments consist of a first reproduction of a recent promising work of DistilBERT while trying to further reduce the resources implied in the process. In fact, we discovered it is possible to achieve approximately the same score of the state-of-the-art but involving only a small fraction of data and training resources. The second proposed experiment consists of an attempt of performing the same distillation task with an architecture based on LUKE, a powerful entity-aware transformer that has recently shown how injecting semantic entities can positively influence the training of these models. Unfortunately, this second experiment, as we will see, did not give us the result we hoped, meaning that the task needs additional research effort.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Cozzi, Riccardo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Transformers,knowledge distillation,neural network distillation,neural network compression,compression,distillation,nlp,natural language processing,transfer learning,deep learning,knowledge injection
Data di discussione della Tesi
22 Marzo 2022
URI

Altri metadati

Gestione del documento: Visualizza il documento

^