Integration of artificial intelligence and network approaches to classify data in social networks

Barbieri, Niccoló (2024) Integration of artificial intelligence and network approaches to classify data in social networks. [Laurea magistrale], Università di Bologna, Corso di Studio in Physics [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Full-text non accessibile fino al 31 Marzo 2025.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (14MB) | Contatta l'autore

Abstract

Arguably one of the biggest challenges of our generation, the COVID-19 pandemic has caused unpredictably high levels of distress and death on a global scale. The production of vaccines so quickly has sparked a heated discussion about the safety of these products, which has been fueled by the ability of today’s social media platforms to disseminate information. In this thesis, we examined about 20 million Italian-language tweets gathered at different points in the pandemic. Seven thousand of these tweets from the early COVID-19 outbreak were taken from the dataset and manually annotated according to the opinions about vaccines: pro-, anti-, and neutral. An additional 700 tweets were gathered in later stages following the original acquisition and annotated similarly. The tweets’ text embedding was then obtained by using BERT, from which we obtained an initial classification with 52% accuracy for the three-label classification and 66% for the two-label. Afterwards, a network of retweets was built and employed to acquire more information. Initially, two hierarchical clustering algorithms were used to separate users into communities. Using the number of neighbours each user has in each community, a proximity measure was estimated for these communities. Subsequently, an algorithm was employed to generate a two-dimensional user embedding, which served as an extra feature for tweet description. The results were greatly enhanced by combining these network-based features with the text embedding; an accuracy of 62% for the three-label classification and 83% for the two-label classification was achieved. The temporal dependence of these kinds of algorithms was then confirmed by comparing the outcomes of this analysis with those of the same classifier on a dataset of tweets from a later temporal period.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Barbieri, Niccoló
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
Network,Clustering,Twitter,covid-19,Classification
Data di discussione della Tesi
27 Marzo 2024
URI

Altri metadati

Gestione del documento: Visualizza il documento

^