Macrì, Silvia Maria
(2021)

*Unsupervised Mondrian forest: a space partition method for clustering.*
[Laurea magistrale], Università di Bologna, Corso di Studio in

Physics [LM-DM270]

Documenti full-text disponibili:

## Abstract

Cluster analysis is an ensemble of techniques whose aim is to divide an unlabeled dataset into groups so that samples with similar features are assigned to the same groups and dissimilar samples are assigned to different ones;
it has been applied in several fields and it consists of a wide variety of techniques, each one designed for a specific type of dataset and required prior informations on its structure.
In this thesis we discuss a new formulation of clustering technique whose structure is similar to that of an unsupervised random forest;
it is based on the Mondrian stochastic process and it consists of a hierarchical partition of the space of definition of the given dataset.
It gives as output an estimation of the probability distribution to belong to a certain class, defined on the whole underlying space, and it shows some interesting properties, like the automatic determination of the number of clusters and the capability to deal with different shape datasets.
After a brief theorical introduction about clustering, the Mondrian stochastic process and its main mathematical properties are defined.
The Mondrian clustering algorithm is then described and results of its applications on two and three dimension toy datasets are presented;
the discussion focuses on the role of the algorithm parameters, that characterize the method and can be possibly tuned by the user, in order to obtain better performances when dealing with different datasets, and on the comparison of the results with that of some notable clustering algorithms.
Finally, some interesting aspects that could be further investigated
are discussed.

Abstract

Cluster analysis is an ensemble of techniques whose aim is to divide an unlabeled dataset into groups so that samples with similar features are assigned to the same groups and dissimilar samples are assigned to different ones;
it has been applied in several fields and it consists of a wide variety of techniques, each one designed for a specific type of dataset and required prior informations on its structure.
In this thesis we discuss a new formulation of clustering technique whose structure is similar to that of an unsupervised random forest;
it is based on the Mondrian stochastic process and it consists of a hierarchical partition of the space of definition of the given dataset.
It gives as output an estimation of the probability distribution to belong to a certain class, defined on the whole underlying space, and it shows some interesting properties, like the automatic determination of the number of clusters and the capability to deal with different shape datasets.
After a brief theorical introduction about clustering, the Mondrian stochastic process and its main mathematical properties are defined.
The Mondrian clustering algorithm is then described and results of its applications on two and three dimension toy datasets are presented;
the discussion focuses on the role of the algorithm parameters, that characterize the method and can be possibly tuned by the user, in order to obtain better performances when dealing with different datasets, and on the comparison of the results with that of some notable clustering algorithms.
Finally, some interesting aspects that could be further investigated
are discussed.

Tipologia del documento

Tesi di laurea
(Laurea magistrale)

Autore della tesi

Macrì, Silvia Maria

Relatore della tesi

Scuola

Corso di studio

Indirizzo

Applied Physics

Ordinamento Cds

DM270

Parole chiave

machine learning,clustering,unsupervised learning,stochastic process,mondrian process

Data di discussione della Tesi

10 Dicembre 2021

URI

## Altri metadati

Tipologia del documento

Tesi di laurea
(NON SPECIFICATO)

Autore della tesi

Macrì, Silvia Maria

Relatore della tesi

Scuola

Corso di studio

Indirizzo

Applied Physics

Ordinamento Cds

DM270

Parole chiave

machine learning,clustering,unsupervised learning,stochastic process,mondrian process

Data di discussione della Tesi

10 Dicembre 2021

URI

## Statistica sui download

Gestione del documento: