Cortesi, Gabriel
(2023)
Design, Implementation and Evaluation of Parallel Solutions for a Nested Explainability Algorithm.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Ingegneria informatica [LM-DM270]
Documenti full-text disponibili:
Abstract
In the field of Machine Learning and Data Science there is an escalating need for performance as workloads become more and more complex. Parallelization over multiple cores and machines (clusters) is often employed as a means to significantly improve performance.
This work specifically considers the explainability algorithm GLEAMS (Global & Local ExplainAbility of black-box Models through Space partitioning) and the poor performance offered by its sequential Python implementation. GLEAMS is a post-hoc, model agnostic explainability technique capable of giving a global understanding of the original model through recursive partitioning of the input space into non overlapping cells, each featuring a local linear approximation of the black-box model.
The purpose of this work is the analysis, development, implementation and testing of a parallel distributed solution for the sequential GLEAMS explainability algorithm. The algorithm poses certain interesting parallelization challenges such as a recursive binary tree and nested parallelism. Notably, the nested nature of the parallelism is of marked relevance due to the complexities it introduces and the poor support that existing Python frameworks and solutions offer for it.
Multiple solutions were designed and implemented, and this paper describes the steps taken for their development, justifies the choices made, explains their workings, illustrates their differences and extensively analyses the performance offered. In particular, this work proposes an asyncio based approach, in combination with the Ray framework, as a practical solution to many of the limitations encountered with the current state of nested parallelism support in Python. Additionally, some theoretical and more general approaches and solutions inspired by other languages are proposed and discussed.
Abstract
In the field of Machine Learning and Data Science there is an escalating need for performance as workloads become more and more complex. Parallelization over multiple cores and machines (clusters) is often employed as a means to significantly improve performance.
This work specifically considers the explainability algorithm GLEAMS (Global & Local ExplainAbility of black-box Models through Space partitioning) and the poor performance offered by its sequential Python implementation. GLEAMS is a post-hoc, model agnostic explainability technique capable of giving a global understanding of the original model through recursive partitioning of the input space into non overlapping cells, each featuring a local linear approximation of the black-box model.
The purpose of this work is the analysis, development, implementation and testing of a parallel distributed solution for the sequential GLEAMS explainability algorithm. The algorithm poses certain interesting parallelization challenges such as a recursive binary tree and nested parallelism. Notably, the nested nature of the parallelism is of marked relevance due to the complexities it introduces and the poor support that existing Python frameworks and solutions offer for it.
Multiple solutions were designed and implemented, and this paper describes the steps taken for their development, justifies the choices made, explains their workings, illustrates their differences and extensively analyses the performance offered. In particular, this work proposes an asyncio based approach, in combination with the Ray framework, as a practical solution to many of the limitations encountered with the current state of nested parallelism support in Python. Additionally, some theoretical and more general approaches and solutions inspired by other languages are proposed and discussed.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Cortesi, Gabriel
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Parallel,Distributed,Explainability,Machine Learning,MOB,GLEAMS,Nested,Global-Lime
Data di discussione della Tesi
23 Marzo 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Cortesi, Gabriel
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Parallel,Distributed,Explainability,Machine Learning,MOB,GLEAMS,Nested,Global-Lime
Data di discussione della Tesi
23 Marzo 2023
URI
Statistica sui download
Gestione del documento: