GlobalTrait : recognizing personalities in multiple languages using aligned embeddings

HKUST Electronic Theses

GlobalTrait : recognizing personalities in multiple languages using aligned embeddings

by Farhad Bin Siddique

THESIS 2019

M.Phil. Electronic and Computer Engineering

xiii, 74 pages : illustrations ; 30 cm

Abstract

This thesis describes our work on Automatic Personality detection from text in the Big Five Personality trait dimension. We take text data from conversational and written transcriptions, along with social media data consisting of Facebook status updates, and Twitter tweets, with the users labeled with the Big Five scores as input and try to identify the user’s personality profile. Our model is a deep learning framework, consisting of a Convolutional Neural Network (CNN), that takes the vector representation of the words as input to extract the necessary features and then pass on to a fully connected layer followed by softmax for binary classification of each of the five traits. In addition to recognizing personality from text in English, we develop a bilingual model that tries to classify personality in two languages, using bilingual embeddings to take advantage of the relatively larger amount of data available in English. We show improvement in our multilingual experiments on Chinese.

We further expand our multilingual work to other languages, using a twitter dataset in four languages: English, Spanish, Dutch and Italian, consisting of user tweets, and the users labeled with personality scores. However, we find that our previous approach of using multilingual embeddings do not give a substantial improvement in the multilingual results. This shows that words that have similar contextual meaning in different languages may not correspond to the same personality traits, since people may express personality using different words, depending on their cultural or language differences. Therefore, we propose GlobalTrait, a personality alignment method for the multilingual embeddings, such that words that correspond to the same personality trait across languages are closer together in the vector space. By applying such alignment to the embeddings and using them as input to our model, we achieve higher F-score results for our multilingual purposes. This method enables us to use the relatively larger amount of data available in high-resource languages such as English to help us recognize personality in other low-resource languages.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Electronic and Computer Engineering Authors Siddique, Farhad Bin Subjects Personality Social sciences Data processing Automatic classification Computational linguistics Language English Call number Thesis ECED 2019 Siddiq DOI 10.14711/thesis-991012730762603412

Full record

GlobalTrait : recognizing personalities in multiple languages using aligned embeddings

by Farhad Bin Siddique

Post a Comment Cancel reply