Using semantic entities to improve the distillation of transformers

Cozzi, Riccardo (2022) Using semantic entities to improve the distillation of transformers. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

In the last decade, the size of deep neural architectures implied in Natural Language Processing (NLP) has increased exponentially, reaching in some cases with hundreds of billions of parameters. Although, training and deploying these huge architectures is an extremely resource-demanding process and the costs are often not affordable in real-world applications. For these reasons, lots of research and industrial efforts are investigating solutions to reduce the size of these models but at the same time maintain high performance. This work was about studying and experimenting Knowledge Distillation techniques with the goal of training smaller and cheaper models while attempting to produce a good approximation of large pre-trained ones. The conducted experiments consist of a first reproduction of a recent promising work of DistilBERT while trying to further reduce the resources implied in the process. In fact, we discovered it is possible to achieve approximately the same score of the state-of-the-art but involving only a small fraction of data and training resources. The second proposed experiment consists of an attempt of performing the same distillation task with an architecture based on LUKE, a powerful entity-aware transformer that has recently shown how injecting semantic entities can positively influence the training of these models. Unfortunately, this second experiment, as we will see, did not give us the result we hoped, meaning that the task needs additional research effort.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Cozzi, Riccardo

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Zugarini, Andrea ; Ernandes, Marco

Scuola

Ingegneria e Architettura

Corso di studio

Artificial intelligence [LM-DM270]

Ordinamento Cds

DM270

Parole chiave

Transformers,knowledge distillation,neural network distillation,neural network compression,compression,distillation,nlp,natural language processing,transfer learning,deep learning,knowledge injection

Data di discussione della Tesi

22 Marzo 2022

URI

https://amslaurea.unibo.it/id/eprint/25787

Altri metadati

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

Using semantic entities to improve the distillation of transformers

Abstract

Altri metadati