THESIS
2018
81 pages : color illustrations ; 30 cm
Abstract
Nowadays, Social Networks (SNs) like Facebook and Twitter are very popular.
Thousands of users post tweets every day. In this dissertation, we are
dealing with three common issues of processing tweets. Firstly, we filter out
the most significant messages of a corpus of tweets, so that we can clear our
dataset from noise and extract information from important only messages.
Secondly, we propose a topic detection model that incorporates time and
location. Thirdly, we propose a novel tweet recommendation framework that
is simple and stable.
Concerning filtering of tweets, we propose a method for classifying tweet
messages into two classes: informative and non-informative. We consider
informative messages those that contain information that interest the public,
trends, events and...[
Read more ]
Nowadays, Social Networks (SNs) like Facebook and Twitter are very popular.
Thousands of users post tweets every day. In this dissertation, we are
dealing with three common issues of processing tweets. Firstly, we filter out
the most significant messages of a corpus of tweets, so that we can clear our
dataset from noise and extract information from important only messages.
Secondly, we propose a topic detection model that incorporates time and
location. Thirdly, we propose a novel tweet recommendation framework that
is simple and stable.
Concerning filtering of tweets, we propose a method for classifying tweet
messages into two classes: informative and non-informative. We consider
informative messages those that contain information that interest the public,
trends, events and news. Non-informative tweets are personal messages
that do not interest the public, like conversations between friends, feelings
and description of mood. The motivation of our work is keeping informative
tweets that contain essential information, and filtering out useless tweets.
Real applications that can benefit from our work are trend/topic detection
applications, recommendation systems and applications that make predictions
based on user messages on social media.
Challenges of processing tweet messages is that they are short messages,
unstructured with unclear topic. We propose a weighted variation of the
binary multinomial naive Bayes' model to identify informative messages. We
train our classifier and we evaluate results using 5-fold and 10-fold cross
validation. We compare the results with the original binary multinomial
naive Bayes' model. We use two independent datasets of tweet messages
crawled from the web. We evaluate and present our results using the following
metrics: accuracy, recall, specificity, F-measure with its variations (F
2 score
and F
0.5 score).
Concerning topic detection, the existing solutions overlook time and location factors, which are quite important and useful. Moreover, social media
are frequently updated. Thus, the proposed detection model should handle
the dynamic updates. We introduce a topic model for topic detection that
combines time and location. Our model is equipped with incremental estimation
of the parameters of the topic model and adaptive window length
according to the correlation of consecutive windows and their density. We
have conducted extensive experiments to verify the effectiveness and efficiency of our proposed Incremental Adaptive Time Location (IncrAdapTL)
model.
Concerning tweet recommendation, twitter users post messages according
to their interests and read tweets of their friends. However, reading tweets in
relevant topics from more users may help them to broaden their perspective
in their interests. Topics combined with time and location are more useful.
For instance, someone during day-time is working downtown at a finance
corporation and during night-time lives with family at another district. This
user is interested to read, during working hours, tweets relevant with finance
or related to downtown, but not tweets related with entertainment. After
work, this user is interested in tweets related to family or entertainment and
maybe not tweets relevant to nightlife.
Our proposed tweet recommendation model consists of three parts: Firstly,
we model users' preferences by using their previously posted tweets, location
and time. Secondly, we model tweet documents by proposing topic
enchanced document vectors. Thirdly, we train our model and we suggest
tweets to users. Our approach offers time efficient update handling without
re-training our model, and tackles the sparsity problem of (user,tweet) pairs.
We evaluate our model on approximately 1 million real tweets from Hong
Kong, and we show that its performance is stable.
Post a Comment