Unsupervised Mondrian forest: a space partition method for clustering

Macrì, Silvia Maria (2021) Unsupervised Mondrian forest: a space partition method for clustering. [Laurea magistrale], Università di Bologna, Corso di Studio in Physics [LM-DM270]
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)

Download (3MB)

Abstract

Cluster analysis is an ensemble of techniques whose aim is to divide an unlabeled dataset into groups so that samples with similar features are assigned to the same groups and dissimilar samples are assigned to different ones; it has been applied in several fields and it consists of a wide variety of techniques, each one designed for a specific type of dataset and required prior informations on its structure. In this thesis we discuss a new formulation of clustering technique whose structure is similar to that of an unsupervised random forest; it is based on the Mondrian stochastic process and it consists of a hierarchical partition of the space of definition of the given dataset. It gives as output an estimation of the probability distribution to belong to a certain class, defined on the whole underlying space, and it shows some interesting properties, like the automatic determination of the number of clusters and the capability to deal with different shape datasets. After a brief theorical introduction about clustering, the Mondrian stochastic process and its main mathematical properties are defined. The Mondrian clustering algorithm is then described and results of its applications on two and three dimension toy datasets are presented; the discussion focuses on the role of the algorithm parameters, that characterize the method and can be possibly tuned by the user, in order to obtain better performances when dealing with different datasets, and on the comparison of the results with that of some notable clustering algorithms. Finally, some interesting aspects that could be further investigated are discussed.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Macrì, Silvia Maria
Relatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
machine learning,clustering,unsupervised learning,stochastic process,mondrian process
Data di discussione della Tesi
10 Dicembre 2021
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^