Improvements to knowledge distillation of deep neural networks

D'Amicantonio, Giacomo (2021) Improvements to knowledge distillation of deep neural networks. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (2MB)

Abstract

One of the main problems in the field of Artificial Intelligence is the efficiency of neural networks models. In the past few years, it seemed that most tasks involving such models could simply be solved by designing larger, deeper models and training them on larger datasets for longer time. This approach requires better performing and therefore expensive and energy consuming hardware and will have an increasingly significant environmental impact when those models are deployed at scale. In 2015 G. Hinton, J. Dean and O. Vinyals presented Knowledge Distillation (KD), a technique that leveraged the logits produced by a big, cumbersome model to guide the training of a smaller model. The two networks were called “Teacher” and “Student” given the analogy between the big model with large knowledge and the small model which has yet to learn everything. They proved that it is possible to extract useful knowledge from the teacher logits and use it to obtain a better performing student when compared with the same model that learned all by itself. This thesis provides an overview of the current state-of-the-art in the field of Knowledge Distillation, analyses some of the most interesting approaches, and builds on them to exploit very confident logits in a more effective way. Furthermore, it provides experimental evidence on the importance of using also smaller logit entries and correcting mistaken predictions from the teacher in the distillation process.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
D'Amicantonio, Giacomo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Knowledge Distillation,Neural Networks,Computer Vision,ResNet,Logits
Data di discussione della Tesi
8 Ottobre 2021
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^