THESIS
2011
ix, 35 p. : ill. ; 30 cm
Abstract
RDF(Resource Description Framework), which is developed by W3C, is a web semantic
data description format. With the development of semantic web, RDF data integrated
from many sources become larger and larger. Because of its large amount and free schema,
the efficiency of RDF data processing still remains a major challenge for RDF data management.
Many research works have been carried out to address this problem. The idea
of property table tries to discover the correlation among the predicates and stores the
related data in the same table so that query processing can be executed in the way like
we conduct it in relational databases. Column store focuses on each individual predicate.
It partitions the RDF data into different tables based on the corresponding predicates and
builds...[
Read more ]
RDF(Resource Description Framework), which is developed by W3C, is a web semantic
data description format. With the development of semantic web, RDF data integrated
from many sources become larger and larger. Because of its large amount and free schema,
the efficiency of RDF data processing still remains a major challenge for RDF data management.
Many research works have been carried out to address this problem. The idea
of property table tries to discover the correlation among the predicates and stores the
related data in the same table so that query processing can be executed in the way like
we conduct it in relational databases. Column store focuses on each individual predicate.
It partitions the RDF data into different tables based on the corresponding predicates and
builds the indices for each table. RDF-3X, a RISC-style engine to manage the RDF data efficiently, keeps the original triple format of RDF data and builds all possible permutation
of indices.
In this thesis, we step further to discover potential properties of RDF data and make
full use of them to process queries efficiently. To be more specified, 1) we introduce
two linkage structures: star linkage and chain linkage. We extract these two kinds of
structure information and build bitmap indices on them. 2) For each distinct predicate
existed in RDF dataset, we build a two-column table, one column for subject values and
the other one for object values.The storage format we choose is similar to the column store.
However, we build a different kind of index structure called Two-Dimension Compressed
Bitmap Matrix. Based on our method, for each predicate, we maintain a two-dimension
matrix. One dimension records object values for one subject and the other dimension
records subject values for one object. Additionally, we build one bitmap to show which
subjects have this predicate and one bitmap to indicate which objects are under this
predicate. We also use some bitmaps to maintain connection information between different
predicates. 3) We design an algorithm to estimate the query selectivity. We can select
the query plan offering better query performance based on the estimation. We evaluate
our approach over two different RDF datasets, Billion Triple Challenge and Yago, and
develop different kinds of possible queries. Compared with RDF-3X and monetDB, the
performance of our approach is better, especially for some queries with star linkage or
chain linkage information.
Post a Comment