THESIS
2019
xiii, 74 pages : illustrations ; 30 cm
Abstract
This thesis describes our work on Automatic Personality detection from text in the
Big Five Personality trait dimension. We take text data from conversational and written
transcriptions, along with social media data consisting of Facebook status updates, and
Twitter tweets, with the users labeled with the Big Five scores as input and try to identify
the user’s personality profile. Our model is a deep learning framework, consisting
of a Convolutional Neural Network (CNN), that takes the vector representation of the
words as input to extract the necessary features and then pass on to a fully connected
layer followed by softmax for binary classification of each of the five traits. In addition
to recognizing personality from text in English, we develop a bilingual model that tries...[
Read more ]
This thesis describes our work on Automatic Personality detection from text in the
Big Five Personality trait dimension. We take text data from conversational and written
transcriptions, along with social media data consisting of Facebook status updates, and
Twitter tweets, with the users labeled with the Big Five scores as input and try to identify
the user’s personality profile. Our model is a deep learning framework, consisting
of a Convolutional Neural Network (CNN), that takes the vector representation of the
words as input to extract the necessary features and then pass on to a fully connected
layer followed by softmax for binary classification of each of the five traits. In addition
to recognizing personality from text in English, we develop a bilingual model that tries
to classify personality in two languages, using bilingual embeddings to take advantage of
the relatively larger amount of data available in English. We show improvement in our
multilingual experiments on Chinese.
We further expand our multilingual work to other languages, using a twitter dataset
in four languages: English, Spanish, Dutch and Italian, consisting of user tweets, and the
users labeled with personality scores. However, we find that our previous approach of
using multilingual embeddings do not give a substantial improvement in the multilingual
results. This shows that words that have similar contextual meaning in different
languages may not correspond to the same personality traits, since people may express
personality using different words, depending on their cultural or language differences.
Therefore, we propose GlobalTrait, a personality alignment method for the multilingual
embeddings, such that words that correspond to the same personality trait across languages
are closer together in the vector space. By applying such alignment to the embeddings
and using them as input to our model, we achieve higher F-score results for our
multilingual purposes. This method enables us to use the relatively larger amount of data
available in high-resource languages such as English to help us recognize personality in
other low-resource languages.
Post a Comment