Efficient Distributed Learning with PowerSGD

Younis, Omar Gallal Aly (2024) Efficient Distributed Learning with PowerSGD. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)
Download (1MB)

Abstract

Deep learning models are becoming more complex and require a lot of computational resources, which are often only available by combining multiple devices such as GPUs. However, distributing the workload to these devices poses many engineering challenges. One of the main challenges is ensuring that communication between the devices is fast enough, even when scaling on multiple nodes. Our experimental study shows that, in some scenarios, communication becomes the major bottleneck, and adding more devices makes the training slower instead of faster. To address this issue, researchers have proposed compression algorithms that are designed to reduce the size of data that needs to be communicated (usually gradient) while retaining as much information as possible. For the compression algorithm to be effective, it must be sufficiently fast to save time in communication, and the compression must be accurate enough that it doesn't negatively affect training. Our dissertation focuses on the state-of-the-art gradient compression algorithm, PowerSGD, and how we have improved it in both speed and accuracy. Specifically, we made compression around 20 times faster, making it effective for wider use cases. We also improved the accuracy of the gradient compression without affecting speed, which led to the training process converging twice as fast compared to standard PowerSGD, for the scenario we tested on. To make these improvements available to the community, we contributed to PyTorch 1.11 and published all the code used for the experiments and our improved PowerSGD versions on GitHub.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Younis, Omar Gallal Aly

Relatore della tesi

Kiziltan, Zeynep

Correlatore della tesi

Vogels, Thijs ; Jaggi, Martin

Scuola

Ingegneria e Architettura

Corso di studio