D'Amicantonio, Giacomo
(2021)
Improvements to knowledge distillation of deep neural networks.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
Abstract
One of the main problems in the field of Artificial Intelligence is the efficiency of neural networks models. In the past few years, it seemed that most tasks involving such models could simply be solved by designing larger, deeper models and training them on larger datasets for longer time. This approach requires better performing and therefore expensive and energy consuming hardware and will have an increasingly significant environmental impact when those models are deployed at scale.
In 2015 G. Hinton, J. Dean and O. Vinyals presented Knowledge Distillation (KD), a technique that leveraged the logits produced by a big, cumbersome model to guide the training of a smaller model. The two networks were called “Teacher” and “Student” given the analogy between the big model with large knowledge and the small model which has yet to learn everything. They proved that it is possible to extract useful knowledge from the teacher logits and use it to obtain a better performing student when compared with the same model that learned all by itself.
This thesis provides an overview of the current state-of-the-art in the field of Knowledge Distillation, analyses some of the most interesting approaches, and builds on them to exploit very confident logits in a more effective way. Furthermore, it provides experimental evidence on the importance of using also smaller
logit entries and correcting mistaken predictions from the teacher in the distillation process.
Abstract
One of the main problems in the field of Artificial Intelligence is the efficiency of neural networks models. In the past few years, it seemed that most tasks involving such models could simply be solved by designing larger, deeper models and training them on larger datasets for longer time. This approach requires better performing and therefore expensive and energy consuming hardware and will have an increasingly significant environmental impact when those models are deployed at scale.
In 2015 G. Hinton, J. Dean and O. Vinyals presented Knowledge Distillation (KD), a technique that leveraged the logits produced by a big, cumbersome model to guide the training of a smaller model. The two networks were called “Teacher” and “Student” given the analogy between the big model with large knowledge and the small model which has yet to learn everything. They proved that it is possible to extract useful knowledge from the teacher logits and use it to obtain a better performing student when compared with the same model that learned all by itself.
This thesis provides an overview of the current state-of-the-art in the field of Knowledge Distillation, analyses some of the most interesting approaches, and builds on them to exploit very confident logits in a more effective way. Furthermore, it provides experimental evidence on the importance of using also smaller
logit entries and correcting mistaken predictions from the teacher in the distillation process.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
D'Amicantonio, Giacomo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Knowledge Distillation,Neural Networks,Computer Vision,ResNet,Logits
Data di discussione della Tesi
8 Ottobre 2021
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
D'Amicantonio, Giacomo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Knowledge Distillation,Neural Networks,Computer Vision,ResNet,Logits
Data di discussione della Tesi
8 Ottobre 2021
URI
Statistica sui download
Gestione del documento: