THESIS
2015
xi, 64 pages : illustrations ; 30 cm
Abstract
The emergence of micro-blogging services changes the form that people share
information on the Web. Its rapidly growing worldwide popularity makes the
notable social networking services like Twitter a potentially large information
base. People use tweets to share opinions and sentiments about what is going
on around them, and sentiment analysis of these short informal texts is now
attracting increasing attentions. However, some distinct characteristics of tweets,
such as the creative spelling and punctuation, genre-specific terminology, bring
new challenges, more than simply applying the traditional information extraction
technologies that have been proved successful in the Web corpus.
In this thesis, we design a sentiment analysis system based on Support Vector
Machine classi...[
Read more ]
The emergence of micro-blogging services changes the form that people share
information on the Web. Its rapidly growing worldwide popularity makes the
notable social networking services like Twitter a potentially large information
base. People use tweets to share opinions and sentiments about what is going
on around them, and sentiment analysis of these short informal texts is now
attracting increasing attentions. However, some distinct characteristics of tweets,
such as the creative spelling and punctuation, genre-specific terminology, bring
new challenges, more than simply applying the traditional information extraction
technologies that have been proved successful in the Web corpus.
In this thesis, we design a sentiment analysis system based on Support Vector
Machine classication model, leveraging a variety of stylistic, lexical, and syntactic feature. With external resources like Tweet-NLP and emoticon dictionary, we
propose a tweet-specific preprocessing method to handle the informal text genres
of tweets. Besides, in order to extract contextual interactions among words for
sentiment analysis, we incorporate dependency paring by Stanford Parser in our
system, which give our system competitive advantage in SemEval-2015 Task10: Sentiment Analysis in Twitter. Our system placed sixth in the message-level
task on the Twitter2015 test set, obtaining a macro-averaged F-score of 63.00.
Finally, we compare the performance of classifier with different feature combinations by ablation experiments, and the results reveal that the syntactic feature
and the lexical feature based on automatic tweet-specific sentiment lexicons are
the most influential feature groups in our sentiment analysis system, providing
gains of 2-7 percentage points on test datasets.
Post a Comment