Efficient structural query processing in XML databases
by Haifeng Jiang
Ph.D. Computer Science
xiv, 125 leaves : ill. ; 30 cm
As XML gains unqualified success through being adopted as a universal data exchange format, particularly in the World Wide Web, more and more information is being stored, exchanged, and presented in it. The ability to intelligently query XML data sources, therefore, becomes increasingly important....[ Read more ]
As XML gains unqualified success through being adopted as a universal data exchange format, particularly in the World Wide Web, more and more information is being stored, exchanged, and presented in it. The ability to intelligently query XML data sources, therefore, becomes increasingly important.
This thesis studies the query processing of a core subset of XML query languages: twig queries with and without OR-predicates. An XML twig query, represented as a labeled query tree, is essentially a complex selection on both the structure and the content of an XML document. Matching a twig query means finding all the instances of the query tree embedded in the data tree that represents the queried XML document. While querying by content can leverage traditional database technologies, evaluating the structural relationships (specifically parent-child and ancestor-descendant relationships) specified in twig queries has imposed a great challenge to existing database technologies.
We present in this thesis a series of holistic twig join algorithms by which query trees are matched not edge by edge but as a whole so that irrelevant intermediate results can be greatly reduced. Specifically, we first present a merge-based algorithm and an index-based algorithm for processing twig queries without OR-predicates. The algorithms access underlying XML data through a generic data access interface so that they are independent of specific physical storage methods. We then extend these two algorithms to handle general twig queries that can contain OR-predicates. To provide an efficient data access interface for our algorithms, we propose a novel external-memory index structure, namely XR-tree, and detail the implementation of the data access interface for XR-tree indexed XML data. We conduct extensive experiments with our algorithms in comparison to existing state-of-art ones. Our algorithms clearly outperform the existing ones, therefore we recommend that they should be adopted by performance-critical XML query systems.