A novel scalable join processor over large RDF graphs with linkage information aware

HKUST Electronic Theses

A novel scalable join processor over large RDF graphs with linkage information aware

by Yincheng Lin

THESIS 2011

M.Phil. Computer Science and Engineering

ix, 35 p. : ill. ; 30 cm

Abstract

RDF(Resource Description Framework), which is developed by W3C, is a web semantic data description format. With the development of semantic web, RDF data integrated from many sources become larger and larger. Because of its large amount and free schema, the efficiency of RDF data processing still remains a major challenge for RDF data management. Many research works have been carried out to address this problem. The idea of property table tries to discover the correlation among the predicates and stores the related data in the same table so that query processing can be executed in the way like we conduct it in relational databases. Column store focuses on each individual predicate. It partitions the RDF data into different tables based on the corresponding predicates and builds the indices for each table. RDF-3X, a RISC-style engine to manage the RDF data efficiently, keeps the original triple format of RDF data and builds all possible permutation of indices.

In this thesis, we step further to discover potential properties of RDF data and make full use of them to process queries efficiently. To be more specified, 1) we introduce two linkage structures: star linkage and chain linkage. We extract these two kinds of structure information and build bitmap indices on them. 2) For each distinct predicate existed in RDF dataset, we build a two-column table, one column for subject values and the other one for object values.The storage format we choose is similar to the column store. However, we build a different kind of index structure called Two-Dimension Compressed Bitmap Matrix. Based on our method, for each predicate, we maintain a two-dimension matrix. One dimension records object values for one subject and the other dimension records subject values for one object. Additionally, we build one bitmap to show which subjects have this predicate and one bitmap to indicate which objects are under this predicate. We also use some bitmaps to maintain connection information between different predicates. 3) We design an algorithm to estimate the query selectivity. We can select the query plan offering better query performance based on the estimation. We evaluate our approach over two different RDF datasets, Billion Triple Challenge and Yago, and develop different kinds of possible queries. Compared with RDF-3X and monetDB, the performance of our approach is better, especially for some queries with star linkage or chain linkage information.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Computer Science and Engineering Authors Lin, Yincheng Subjects Web site development Semantic Web Metadata Language English Call number Thesis CSED 2011 LinY DOI 10.14711/thesis-b1154818

Full record

A novel scalable join processor over large RDF graphs with linkage information aware

by Yincheng Lin

Post a Comment Cancel reply