Berti, Matteo
(2020)
Anomalous Activity Detection with Temporal Convolutional Networks in HPC Systems.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Informatica [LM-DM270]
Documenti full-text disponibili:
Abstract
Detecting suspicious or unauthorized activities is an important concern for High-Performance Computing (HPC) systems administrators. Automatic classification of programs running on these systems could be a valuable aid towards this goal. This thesis proposes a machine learning model capable of classifying programs running on a HPC system into various types by monitoring metrics associated with different physical and architectural system components. As a specific case study, we consider the problem of detecting password-cracking programs that may have been introduced into the normal workload of a HPC system through clandestine means.
Our study is based on data collected from a HPC system called DAVIDE installed at Cineca. These data correspond to hundreds of physical and architectural metrics that are defined for this system. We rely on Principal Component Analysis (PCA) as well as our personal knowledge of the system to select a subset of metrics to be used for the analysis. A time series oversampling technique is also proposed in order to increase the available data related to password-cracking activities. Finally, a deep learning model based on Temporal Convolutional Networks (TCNs) is presented, with the goal of distinguishing between anomalous and normal activities.
Our results show that the proposed model has excellent performance in terms of classification accuracy both with balanced (95%) and imbalanced (98%) datasets. The proposed network achieves an F score of 95.5% when training on a balanced dataset, and an AUC-ROC of 0.99 for both balanced and imbalanced data.
Abstract
Detecting suspicious or unauthorized activities is an important concern for High-Performance Computing (HPC) systems administrators. Automatic classification of programs running on these systems could be a valuable aid towards this goal. This thesis proposes a machine learning model capable of classifying programs running on a HPC system into various types by monitoring metrics associated with different physical and architectural system components. As a specific case study, we consider the problem of detecting password-cracking programs that may have been introduced into the normal workload of a HPC system through clandestine means.
Our study is based on data collected from a HPC system called DAVIDE installed at Cineca. These data correspond to hundreds of physical and architectural metrics that are defined for this system. We rely on Principal Component Analysis (PCA) as well as our personal knowledge of the system to select a subset of metrics to be used for the analysis. A time series oversampling technique is also proposed in order to increase the available data related to password-cracking activities. Finally, a deep learning model based on Temporal Convolutional Networks (TCNs) is presented, with the goal of distinguishing between anomalous and normal activities.
Our results show that the proposed model has excellent performance in terms of classification accuracy both with balanced (95%) and imbalanced (98%) datasets. The proposed network achieves an F score of 95.5% when training on a balanced dataset, and an AUC-ROC of 0.99 for both balanced and imbalanced data.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Berti, Matteo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM A: TECNICHE DEL SOFTWARE
Ordinamento Cds
DM270
Parole chiave
HPC,Machine Learning,TCN,High Performance Computing,Anomaly Detection,Temporal Convolutional Networks,Neural Networks,Cineca
Data di discussione della Tesi
17 Dicembre 2020
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Berti, Matteo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM A: TECNICHE DEL SOFTWARE
Ordinamento Cds
DM270
Parole chiave
HPC,Machine Learning,TCN,High Performance Computing,Anomaly Detection,Temporal Convolutional Networks,Neural Networks,Cineca
Data di discussione della Tesi
17 Dicembre 2020
URI
Statistica sui download
Gestione del documento: