New markov chain based methods for single and cross-domain sentiment classification

Pagliarani, Andrea (2015) New markov chain based methods for single and cross-domain sentiment classification. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena
Documenti full-text disponibili:
[thumbnail of Pagliarani_Andrea_tesi.pdf]
Anteprima
Documento PDF
Download (872kB) | Anteprima

Abstract

Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Pagliarani, Andrea
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Text Mining, Opinion Mining, Sentiment Analysis, Markov Model, Regular Markov Model, Hidden Markov Model, multi source, document expansion, term weighting, feature selection, preprocessing, stemming, lemmatization, stop words
Data di discussione della Tesi
19 Marzo 2015
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^