Performing anomaly detection on logs from an analytical platform

Angileri, Chiara (2024) Performing anomaly detection on logs from an analytical platform. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

In this era of software technology, it is really important to secure systems and keep them running smoothly. The basic idea of this thesis is to carry out a study on machine learning and how it can detect anomalies from logs generated by an analytical platform. The platform under analysis is designed for risk and models management and it produces a large amount of logs that capture system and user actions and events. Analysing these logs helps identifying issues such as security breaches, operational failures or performance bottlenecks. To approach this challenge, two main techniques are tested. The first uses supervised learning with Random Forest to classify individual logs as normal or anomalous based on historical patterns. The second approach employs unsupervised learning, using Isolation Forest and clustering to find entire sessions that deviate from the norm. Logs are preprocessed using a combination of regex patterns and the Drain log parser, turning the raw text into structured data that can be feed into machine learning models. A key point to run these experiments has been the generation of logs deriving from a penetration test to simulate attacks. The results reveal that both methodologies are effective in identifying anomalies, with each method offering different advantages. By automating the process of log analysis, this research shows how machine learning technology can be applied to a real, complex system to improve safety and reliability. Moreover, the work highlights the importance of choosing preprocessing techniques and models with respect to the type of anomaly that is under analysis. Overall, this thesis emphasises the use of log anomaly detection for the supervision and protection of real-world analytical platforms, contributing to the expanding domain of security and operational intelligence.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Angileri, Chiara

Relatore della tesi

Lodi, Stefano

Scuola

Ingegneria e Architettura

Corso di studio

Artificial intelligence [LM-DM270]

Ordinamento Cds