THESIS
2013
ix, 56 p. : ill. ; 30 cm
Abstract
Keyword query is more user-friendly than the structured query (SQL, XQuery, SPARQL)
as it requires no pre-knowledge about the underlying schema of the databases. In the recent
decade there has been much work focusing on the keyword search in both the structured
databases and unstructured databases. Particularly, since the labeled graph model
can generally present both the structured and unstructured databases, graph keyword
search has been a hot topic in the database community. Furthermore, with the rapid
development of semantic web, the graph data has grown to a huge scale and has to be
stored and processed in a distributed environment. However, no existing approach can be
directly applied to answer the keyword queries in the distributed graphs. The challenges
exist in two-fol...[
Read more ]
Keyword query is more user-friendly than the structured query (SQL, XQuery, SPARQL)
as it requires no pre-knowledge about the underlying schema of the databases. In the recent
decade there has been much work focusing on the keyword search in both the structured
databases and unstructured databases. Particularly, since the labeled graph model
can generally present both the structured and unstructured databases, graph keyword
search has been a hot topic in the database community. Furthermore, with the rapid
development of semantic web, the graph data has grown to a huge scale and has to be
stored and processed in a distributed environment. However, no existing approach can be
directly applied to answer the keyword queries in the distributed graphs. The challenges
exist in two-fold, the poor locality exists in the keyword query answers may lead to high
communication cost for graph exploration and the huge traffic load for shifting the data
among different machines. To address these two challenges, in this thesis, we first propose
to optimize the index in order to minimal the distributed query processing cost within
the space budget on each machine; Then, based on this indexing scheme we introduce an
advanced distributed query processing algorithm, which includes an optimal scheduling
problem defined to reduce traffic and the time cost, and we propose a greedy algorithm
with a factor-2 approximation to the optimal schedule problem; At last, we verify the
effectiveness and efficiency of our solution with large scale real datasets.
Post a Comment