THESIS
2021
1 online resource (xii, 130 pages) : illustrations (chiefly color)
Abstract
Currently, supervised learning based methods and techniques such as deep learning methods, have
achieved great success in the text mining area. When researchers develop these giant models, they
usually assume the availability of massive annotated training data. However, the real-world usefulness
of these models will be impaired because in the real world, readily available annotated data
are scarce. Unlabeled training data can also be rare in some circumstances such as personalization
modeling. Fortunately, there is useful information existing in many inexpensive and readily available
resources. In this thesis, we show how to utilize auxiliary information from text and graphs to
alleviate the scarcity of training data in the sentiment analysis, text classification, and personalized
word...[
Read more ]
Currently, supervised learning based methods and techniques such as deep learning methods, have
achieved great success in the text mining area. When researchers develop these giant models, they
usually assume the availability of massive annotated training data. However, the real-world usefulness
of these models will be impaired because in the real world, readily available annotated data
are scarce. Unlabeled training data can also be rare in some circumstances such as personalization
modeling. Fortunately, there is useful information existing in many inexpensive and readily available
resources. In this thesis, we show how to utilize auxiliary information from text and graphs to
alleviate the scarcity of training data in the sentiment analysis, text classification, and personalized
word embeddings task. To alleviate the scarcity of annotated training data, we utilize auxiliary
information to obtain supervision signals, so that models are trained using these signals rather than
annotations. To alleviate the scarcity of unlabeled training data, we utilize auxiliary information
to design heuristics to enrich training data. For example, for sentiment analysis, supervision can
be opinion words in the text. For text classification, supervision can be generated words from a
masked language model via querying about the topic of a document. For personalized word embeddings
learning, auxiliary information can be a social graph. To utilize supervision signals from text data, we propose a variational weakly-supervised framework for the sentiment analysis and
text classification task. To utilize auxiliary information from text data, we use a social network as a
regularization that encourages users to learn from friends’ corpus, thus gaining more training data.
Post a Comment