Angileri, Chiara
(2024)
Performing anomaly detection on logs from an analytical platform.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
In this era of software technology, it is really important to secure systems and keep them running smoothly. The basic idea of this thesis is to carry out a study on machine learning and how it can detect anomalies from logs generated by an analytical platform. The platform under analysis is designed for risk and models management and it produces a large amount of logs that capture system and user actions and events. Analysing these logs helps identifying issues such as security breaches, operational failures or performance bottlenecks.
To approach this challenge, two main techniques are tested. The first uses supervised learning with Random Forest to classify individual logs as normal or anomalous based on historical patterns. The second approach employs unsupervised learning, using Isolation Forest and clustering to find entire sessions that deviate from the norm. Logs are preprocessed using a combination of regex patterns and the Drain log parser, turning the raw text into structured data that can be feed into machine learning models. A key point to run these experiments has been the generation of logs deriving from a penetration test to simulate attacks.
The results reveal that both methodologies are effective in identifying anomalies, with each method offering different advantages. By automating the process of log analysis, this research shows how machine learning technology can be applied to a real, complex system to improve safety and reliability. Moreover, the work highlights the importance of choosing preprocessing techniques and models with respect to the type of anomaly that is under analysis. Overall, this thesis emphasises the use of log anomaly detection for the supervision and protection of real-world analytical platforms, contributing to the expanding domain of security and operational intelligence.
Abstract
In this era of software technology, it is really important to secure systems and keep them running smoothly. The basic idea of this thesis is to carry out a study on machine learning and how it can detect anomalies from logs generated by an analytical platform. The platform under analysis is designed for risk and models management and it produces a large amount of logs that capture system and user actions and events. Analysing these logs helps identifying issues such as security breaches, operational failures or performance bottlenecks.
To approach this challenge, two main techniques are tested. The first uses supervised learning with Random Forest to classify individual logs as normal or anomalous based on historical patterns. The second approach employs unsupervised learning, using Isolation Forest and clustering to find entire sessions that deviate from the norm. Logs are preprocessed using a combination of regex patterns and the Drain log parser, turning the raw text into structured data that can be feed into machine learning models. A key point to run these experiments has been the generation of logs deriving from a penetration test to simulate attacks.
The results reveal that both methodologies are effective in identifying anomalies, with each method offering different advantages. By automating the process of log analysis, this research shows how machine learning technology can be applied to a real, complex system to improve safety and reliability. Moreover, the work highlights the importance of choosing preprocessing techniques and models with respect to the type of anomaly that is under analysis. Overall, this thesis emphasises the use of log anomaly detection for the supervision and protection of real-world analytical platforms, contributing to the expanding domain of security and operational intelligence.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Angileri, Chiara
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Anomaly,Model,Machine learning,Logs,Analysis,Big data
Data di discussione della Tesi
8 Ottobre 2024
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Angileri, Chiara
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Anomaly,Model,Machine learning,Logs,Analysis,Big data
Data di discussione della Tesi
8 Ottobre 2024
URI
Gestione del documento: