Barbieri, Niccoló
(2024)
Integration of artificial intelligence and network approaches to classify data in social networks.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Physics [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
Abstract
Arguably one of the biggest challenges of our generation, the COVID-19 pandemic has
caused unpredictably high levels of distress and death on a global scale. The production
of vaccines so quickly has sparked a heated discussion about the safety of these products,
which has been fueled by the ability of today’s social media platforms to disseminate
information. In this thesis, we examined about 20 million Italian-language tweets gathered at different points in the pandemic. Seven thousand of these tweets from the early
COVID-19 outbreak were taken from the dataset and manually annotated according to
the opinions about vaccines: pro-, anti-, and neutral. An additional 700 tweets were
gathered in later stages following the original acquisition and annotated similarly. The
tweets’ text embedding was then obtained by using BERT, from which we obtained an
initial classification with 52% accuracy for the three-label classification and 66% for the
two-label. Afterwards, a network of retweets was built and employed to acquire more
information. Initially, two hierarchical clustering algorithms were used to separate users
into communities. Using the number of neighbours each user has in each community,
a proximity measure was estimated for these communities. Subsequently, an algorithm
was employed to generate a two-dimensional user embedding, which served as an extra
feature for tweet description. The results were greatly enhanced by combining these
network-based features with the text embedding; an accuracy of 62% for the three-label
classification and 83% for the two-label classification was achieved. The temporal dependence of these kinds of algorithms was then confirmed by comparing the outcomes of
this analysis with those of the same classifier on a dataset of tweets from a later temporal
period.
Abstract
Arguably one of the biggest challenges of our generation, the COVID-19 pandemic has
caused unpredictably high levels of distress and death on a global scale. The production
of vaccines so quickly has sparked a heated discussion about the safety of these products,
which has been fueled by the ability of today’s social media platforms to disseminate
information. In this thesis, we examined about 20 million Italian-language tweets gathered at different points in the pandemic. Seven thousand of these tweets from the early
COVID-19 outbreak were taken from the dataset and manually annotated according to
the opinions about vaccines: pro-, anti-, and neutral. An additional 700 tweets were
gathered in later stages following the original acquisition and annotated similarly. The
tweets’ text embedding was then obtained by using BERT, from which we obtained an
initial classification with 52% accuracy for the three-label classification and 66% for the
two-label. Afterwards, a network of retweets was built and employed to acquire more
information. Initially, two hierarchical clustering algorithms were used to separate users
into communities. Using the number of neighbours each user has in each community,
a proximity measure was estimated for these communities. Subsequently, an algorithm
was employed to generate a two-dimensional user embedding, which served as an extra
feature for tweet description. The results were greatly enhanced by combining these
network-based features with the text embedding; an accuracy of 62% for the three-label
classification and 83% for the two-label classification was achieved. The temporal dependence of these kinds of algorithms was then confirmed by comparing the outcomes of
this analysis with those of the same classifier on a dataset of tweets from a later temporal
period.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Barbieri, Niccoló
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
Network,Clustering,Twitter,covid-19,Classification
Data di discussione della Tesi
27 Marzo 2024
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Barbieri, Niccoló
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
Network,Clustering,Twitter,covid-19,Classification
Data di discussione della Tesi
27 Marzo 2024
URI
Gestione del documento: