Younis, Omar Gallal Aly
(2024)
Efficient Distributed Learning with PowerSGD.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
Abstract
Deep learning models are becoming more complex and require a lot of computational resources, which are often only available by combining multiple devices such as GPUs. However, distributing the workload to these devices poses many engineering challenges. One of the main challenges is ensuring that communication between the devices is fast enough, even when scaling on multiple nodes. Our experimental study shows that, in some scenarios, communication becomes the major bottleneck, and adding more devices makes the training slower instead of faster. To address this issue, researchers have proposed compression algorithms that are designed to reduce the size of data that needs to be communicated (usually gradient) while retaining as much information as possible. For the compression algorithm to be effective, it must be sufficiently fast to save time in communication, and the compression must be accurate enough that it doesn't negatively affect training.
Our dissertation focuses on the state-of-the-art gradient compression algorithm, PowerSGD, and how we have improved it in both speed and accuracy. Specifically, we made compression around 20 times faster, making it effective for wider use cases. We also improved the accuracy of the gradient compression without affecting speed, which led to the training process converging twice as fast compared to standard PowerSGD, for the scenario we tested on. To make these improvements available to the community, we contributed to PyTorch 1.11 and published all the code used for the experiments and our improved PowerSGD versions on GitHub.
Abstract
Deep learning models are becoming more complex and require a lot of computational resources, which are often only available by combining multiple devices such as GPUs. However, distributing the workload to these devices poses many engineering challenges. One of the main challenges is ensuring that communication between the devices is fast enough, even when scaling on multiple nodes. Our experimental study shows that, in some scenarios, communication becomes the major bottleneck, and adding more devices makes the training slower instead of faster. To address this issue, researchers have proposed compression algorithms that are designed to reduce the size of data that needs to be communicated (usually gradient) while retaining as much information as possible. For the compression algorithm to be effective, it must be sufficiently fast to save time in communication, and the compression must be accurate enough that it doesn't negatively affect training.
Our dissertation focuses on the state-of-the-art gradient compression algorithm, PowerSGD, and how we have improved it in both speed and accuracy. Specifically, we made compression around 20 times faster, making it effective for wider use cases. We also improved the accuracy of the gradient compression without affecting speed, which led to the training process converging twice as fast compared to standard PowerSGD, for the scenario we tested on. To make these improvements available to the community, we contributed to PyTorch 1.11 and published all the code used for the experiments and our improved PowerSGD versions on GitHub.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Younis, Omar Gallal Aly
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Distributed learning,Deep learning,Federated learning,compression algorithms,Large scale,Scaling laws,Large language models
Data di discussione della Tesi
19 Marzo 2024
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Younis, Omar Gallal Aly
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Distributed learning,Deep learning,Federated learning,compression algorithms,Large scale,Scaling laws,Large language models
Data di discussione della Tesi
19 Marzo 2024
URI
Statistica sui download
Gestione del documento: