THESIS
2021
1 online resource (xv, 153 pages) : illustrations (some color)
Abstract
Nowadays, a lot of openly available knowledge bases (KBs) are constructed to facilitate knowledge-centric
applications, such as search engines and online recommendations. However, most openly
available KBs are incomplete, since they are not synchronized with the emerging facts happening
in the real world. Therefore, knowledge base population (KBP) from external data sources, which
extracts knowledge from unstructured text to populate KBs, becomes a vital task. Recent research
proposes two types of solutions that partially address this problem, but the performance of these
solutions is limited. The first solution, dynamic KB construction from unstructured text, requires
specifications of which predicates are of interest to the KB, which needs preliminary setups and is
not suitable for an...[
Read more ]
Nowadays, a lot of openly available knowledge bases (KBs) are constructed to facilitate knowledge-centric
applications, such as search engines and online recommendations. However, most openly
available KBs are incomplete, since they are not synchronized with the emerging facts happening
in the real world. Therefore, knowledge base population (KBP) from external data sources, which
extracts knowledge from unstructured text to populate KBs, becomes a vital task. Recent research
proposes two types of solutions that partially address this problem, but the performance of these
solutions is limited. The first solution, dynamic KB construction from unstructured text, requires
specifications of which predicates are of interest to the KB, which needs preliminary setups and is
not suitable for an in-time population scenario. The second solution, Open Information Extraction
(Open IE) from unstructured text, has limitations in producing facts that can be directly linked to
the target KB without redundancy and ambiguity.
In this thesis, we investigate the end-to-end KBP task from unstructured text in external data
sources with the support of Open IE, which contains three major research problems. First, we address the knowledge canonicalization problem, which performs the canonicalization of the noun
phrases and relational phrases in the Open IE triples jointly to remove the redundant and ambiguous
facts. We propose SIST, an efficient canonicalization model leveraging the side information from
the context of the original data sources.
Second, we study the knowledge fusion problem, which targets at determining the most complete
and accurate aggregated facts from diverse and conflicting data sources. We propose DART,
an integrated Bayesian approach which comprehensively incorporates the domain expertise of the
data sources, to infer the multiple possible truths of a fact.
Third, we investigate the knowledge linking problem, which disambiguates the entities and relations
extracted in the facts jointly, and links them to the existing concepts in the current KBs. We
propose KBPearl as a solution under the global coherence assumption that all the entities and predicates
mentioned in the same short-text document are densely related to each other. Specifically, we
employ a semantic graph-based approach to capture the knowledge in the source document, and to
determine the best linking results by finding the densest subgraph effectively and efficiently. Moreover,
we also propose TENET as a solution under the sparse coherence assumption that not every
pair of entities or predicates in a long-text document is strongly related to each other. Specifically,
we formulate the joint entity and relation linking task as a minimum-cost rooted tree cover problem
on the knowledge coherence graph constructed based on the document, and propose approximation
algorithms with pruning strategies to address this problem and derive the linking results.
We demonstrate the effectiveness and efficiency of the proposed solutions of each of the above
problems against the state-of-the-art techniques, through extensive experiments on real-world datasets.
In the end, we conclude the thesis with future research directions and challenges related to the KBP
task.
Post a Comment