Efficient structural query processing in XML databases

HKUST Electronic Theses

Efficient structural query processing in XML databases

by Haifeng Jiang

THESIS 2004

Ph.D. Computer Science

xiv, 125 leaves : ill. ; 30 cm

Abstract

As XML gains unqualified success through being adopted as a universal data exchange format, particularly in the World Wide Web, more and more information is being stored, exchanged, and presented in it. The ability to intelligently query XML data sources, therefore, becomes increasingly important.

This thesis studies the query processing of a core subset of XML query languages: twig queries with and without OR-predicates. An XML twig query, represented as a labeled query tree, is essentially a complex selection on both the structure and the content of an XML document. Matching a twig query means finding all the instances of the query tree embedded in the data tree that represents the queried XML document. While querying by content can leverage traditional database technologies, evaluating the structural relationships (specifically parent-child and ancestor-descendant relationships) specified in twig queries has imposed a great challenge to existing database technologies.

We present in this thesis a series of holistic twig join algorithms by which query trees are matched not edge by edge but as a whole so that irrelevant intermediate results can be greatly reduced. Specifically, we first present a merge-based algorithm and an index-based algorithm for processing twig queries without OR-predicates. The algorithms access underlying XML data through a generic data access interface so that they are independent of specific physical storage methods. We then extend these two algorithms to handle general twig queries that can contain OR-predicates. To provide an efficient data access interface for our algorithms, we propose a novel external-memory index structure, namely XR-tree, and detail the implementation of the data access interface for XR-tree indexed XML data. We conduct extensive experiments with our algorithms in comparison to existing state-of-art ones. Our algorithms clearly outperform the existing ones, therefore we recommend that they should be adopted by performance-critical XML query systems.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent. Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science Authors Jiang, Haifeng Subjects XML (Document markup language) Query languages (Computer science) Language English Call number Thesis COMP 2004 Jiang DOI 10.14711/thesis-b837445

Full record

Efficient structural query processing in XML databases

by Haifeng Jiang

Post a Comment Cancel reply