New markov chain based methods for single and cross-domain sentiment classification

Pagliarani, Andrea (2015) New markov chain based methods for single and cross-domain sentiment classification. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena

Salva citazione

Documenti full-text disponibili:

[thumbnail of Pagliarani_Andrea_tesi.pdf]

Anteprima

Documento PDF
Download (872kB) | Anteprima

Abstract

Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Pagliarani, Andrea

Relatore della tesi

Moro, Gianluca

Correlatore della tesi

Domeniconi, Giacomo

Scuola

Ingegneria e Architettura

Corso di studio

Ingegneria e scienze informatiche [LM-DM270] - Cesena

Ordinamento Cds

DM270

Parole chiave

Text Mining, Opinion Mining, Sentiment Analysis, Markov Model, Regular Markov Model, Hidden Markov Model, multi source, document expansion, term weighting, feature selection, preprocessing, stemming, lemmatization, stop words

Data di discussione della Tesi

19 Marzo 2015

URI

https://amslaurea.unibo.it/id/eprint/8445

Altri metadati

Statistica sui download

Vedi altre statistiche

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

New markov chain based methods for single and cross-domain sentiment classification

Abstract

Altri metadati

Statistica sui download