Calarota, Gabriele
(2021)
On Authorship Attribution.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Informatica [LM-DM270]
Documenti full-text disponibili:
Abstract
Authorship attribution is the process of identifying the author of a given text and from
the machine learning perspective, it can be seen as a classification problem. In the
literature, there are a lot of classification methods for which feature extraction techniques
are conducted. In this thesis, we explore information retrieval techniques such as Doc2Vec
and other useful feature selection and extraction techniques for a given text with different
classifiers. The main purpose of this work is to lay the foundations of feature extraction
techniques in authorship attribution. At the end of this work, we show how we compared
our results with related works and how we managed to improve, to the best of our
knowledge, the results on a particular dataset, very known in this field.
Abstract
Authorship attribution is the process of identifying the author of a given text and from
the machine learning perspective, it can be seen as a classification problem. In the
literature, there are a lot of classification methods for which feature extraction techniques
are conducted. In this thesis, we explore information retrieval techniques such as Doc2Vec
and other useful feature selection and extraction techniques for a given text with different
classifiers. The main purpose of this work is to lay the foundations of feature extraction
techniques in authorship attribution. At the end of this work, we show how we compared
our results with related works and how we managed to improve, to the best of our
knowledge, the results on a particular dataset, very known in this field.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Calarota, Gabriele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM A: TECNICHE DEL SOFTWARE
Ordinamento Cds
DM270
Parole chiave
authorship attribution,machine learning,svm,reuters corpus,gdelt,amazon food reviews,tpot,supervised learning,the guardian newspaper
Data di discussione della Tesi
18 Marzo 2021
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Calarota, Gabriele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM A: TECNICHE DEL SOFTWARE
Ordinamento Cds
DM270
Parole chiave
authorship attribution,machine learning,svm,reuters corpus,gdelt,amazon food reviews,tpot,supervised learning,the guardian newspaper
Data di discussione della Tesi
18 Marzo 2021
URI
Statistica sui download
Gestione del documento: