THESIS
2021
1 online resource (xiii, 86 pages) : illustrations (some color)
Abstract
Deep learning models have gained much success in Information Extraction (IE) from text. Such
models usually require a large number of labeled samples to train. Since human annotation can be
difficult and time consuming, automatically generated weak supervision is widely leveraged.
We investigate the creation and the use of weak annotations for IE with two tasks: Aspect and
Opinion Term Extraction (AOTE), and Entity Typing. They belong to the two kinds of operations
that an IE system needs to carry out, respectively.
First, we are interested in generating context-dependent weak annotations without much human
effort. For AOTE, we propose an approach to annotating a large number of training samples with
automatic annotation rules. The rules are mined from a small human labeled sample set,...[
Read more ]
Deep learning models have gained much success in Information Extraction (IE) from text. Such
models usually require a large number of labeled samples to train. Since human annotation can be
difficult and time consuming, automatically generated weak supervision is widely leveraged.
We investigate the creation and the use of weak annotations for IE with two tasks: Aspect and
Opinion Term Extraction (AOTE), and Entity Typing. They belong to the two kinds of operations
that an IE system needs to carry out, respectively.
First, we are interested in generating context-dependent weak annotations without much human
effort. For AOTE, we propose an approach to annotating a large number of training samples with
automatic annotation rules. The rules are mined from a small human labeled sample set, and thus
do not need to be designed manually. For the task of entity typing, we propose an approach that
generates entity type labels by exploiting a pretrained masked language model.
For the use of the generated weak annotations, we consider two settings. One setting is that
only a set of weakly labeled samples is available. Under this setting, we propose to improve the
performance of an entity typing model by leveraging external knowledge. Another setting is that
both a set of weakly labeled samples and a small set of human annotated samples are available. We
show that pretraining neural models with weak supervision, then fine-tuning them on human annotated data can yield good results. Then, with the task of entity typing, we investigate a framework
that obtains a better performing system by first training multiple models with the weakly labeled
data, then stacking them with the help of a small high quality sample set.
Post a Comment