THESIS
2004
xiii, 120 leaves : ill. ; 30 cm
Abstract
This dissertation is about developing advanced query processing and estimation techniques for database systems managing XML data. More specifically, an important operator in such systems, structural join, is studied. The following two issues related to the current trends in database research are addressed: efficient processing structural joins and estimating the result size of a structural join....[
Read more ]
This dissertation is about developing advanced query processing and estimation techniques for database systems managing XML data. More specifically, an important operator in such systems, structural join, is studied. The following two issues related to the current trends in database research are addressed: efficient processing structural joins and estimating the result size of a structural join.
Extensible Markup Language (XML) has become the de facto standard for data representation and exchange over the Internet. It has enabled and stimulated a great multitude of research and applications. However, unique features of XML and its query languages have posed great challenges to efficient managing and querying large volumes of XML data. This, in turn, hinders progress of XML research and the development of applications based on XML. This dissertation makes the attempt to enhance the efficiency of XML database management system by advanced query processing and optimization techniques.
We first consider the efficient processing of structural joins for XML data. A structural join takes two sets of XML nodes as input and returns pairs of nodes such that a special ancestor-descendant relationship holds between them. Structural join is widely accepted as an core operator in XML query processing. An efficient and robust structural query processing framework based on a novel coding scheme, PBiTree coding, is proposed. The PBiTree code enables efficient checking of the ancestor-descendant relationship between two nodes solely based on their PBiTree codes. We present algorithms in the framework that are optimized for various combinations of physical settings. In particular, the newly proposed partitioning based algorithms can process structural joins efficiently without sorting or indexing. Experimental results demonstrate that the structural join processing algorithms based on the proposed coding scheme outperform existing algorithms significantly.
Next, we study the result size estimation problem of structural joins. Estimating the size of structural join result is essential to generating efficient XML query processing plans in an XML query optimizer. We propose two models, the interval model and the position model, under which the original estimation problem can be converted into estimating the size of a spatial join and a relational equijoin respectively. A set of estimation methods based on the histogram and sampling techniques are developed, which have not only high accuracy but also theoretical guarantees on the estimation. Comprehensive performance studies are conducted. The results demonstrate that the accuracy and robustness of our newly proposed estimation methods outperforms those of the previous method up to an order of magnitude.
Post a Comment