THESIS
2021
1 online resource (8, 89 pages) : color illustrations
Abstract
Knowledge base systems such as Freebase, YAGO, etc. have been designed and widely applied while most of the knowledge bases are far from being of a high quality. According to the recent researches, the low quality is mainly caused by the loss and low accuracy of the RDF triples, which are the main components of knowledge base systems. Therefore, it is important that we propose methods to enhance the RDF triples in knowledge bases, which is significant for providing good information retrieval service. On the other hand, the low accuracy of the data in the knowledge base comes from different data sources, thus it is not only important to improve the existing knowledge triples in the knowledge bases, but also important to care about the erroneous data in the data sources. Therefore, we als...[
Read more ]
Knowledge base systems such as Freebase, YAGO, etc. have been designed and widely applied while most of the knowledge bases are far from being of a high quality. According to the recent researches, the low quality is mainly caused by the loss and low accuracy of the RDF triples, which are the main components of knowledge base systems. Therefore, it is important that we propose methods to enhance the RDF triples in knowledge bases, which is significant for providing good information retrieval service. On the other hand, the low accuracy of the data in the knowledge base comes from different data sources, thus it is not only important to improve the existing knowledge triples in the knowledge bases, but also important to care about the erroneous data in the data sources. Therefore, we also want to explore the data cleaning methods to filter and correct erroneous data for achieving higher quality knowledge bases. Overall, either the low accuracy knowledge triples in knowledge bases or erroneous data in the data sources are dirty data, which requires to be repaired.
In this thesis, we formulate the problem of revising the low accuracy knowledge triples and erroneous data in the data sources as a data fusion problem, and would like to solve the problem in two directions. Firstly, we specifically discuss how to discover and correct the low accuracy knowledge triples in the existing knowledge bases. Then we dive into a more common problem: how to discover and correct the erroneous data in the data sources as they are the information providers. During the process of solving the two problems, we applied crowdsourcing,
which is a powerful and widely used method in data field.
Post a Comment