THESIS
2005
84 leaves : ill. ; 30 cm
Abstract
Extensible Markup Language (XML) has become the de facto standard for data representation, exchange and publishing over the Internet. As more and more XML data is gathered and processed by database management systems, we need innovative approaches to efficiently manage and query it. However, XML queries possess some unique features, which distinguish them from the traditional RDBMS queries and pose a challenge to the XML research community. This dissertation investigates the problem of how to improve the XML path expression performance by considering the unique features of path expression, since path expression is the core part of XML queries. By fully exploiting these features, we are able to propose some novel algorithms to significantly improve XML path expression performance....[
Read more ]
Extensible Markup Language (XML) has become the de facto standard for data representation, exchange and publishing over the Internet. As more and more XML data is gathered and processed by database management systems, we need innovative approaches to efficiently manage and query it. However, XML queries possess some unique features, which distinguish them from the traditional RDBMS queries and pose a challenge to the XML research community. This dissertation investigates the problem of how to improve the XML path expression performance by considering the unique features of path expression, since path expression is the core part of XML queries. By fully exploiting these features, we are able to propose some novel algorithms to significantly improve XML path expression performance.
We first investigate the issue of optimizing XML path expressions without considering the workload. A basic kind of XML path expression is the structural query, which retrieves all descendants of a specific XML element. Structural queries are important because they constitute the building block of various complex XML path expressions. Among proposed XML query processing techniques for structural queries, structural join outperforms most other graph-traversal based approaches as it minimizes the number of nodes accessed to evaluate a structural query. In the first part of this dissertation we show that general path expressions, involving conditions on attributes, possess inherent multi-dimensional characteristics that are not captured by existing XML query processing methods based on 1D ordering. Motivated by this, we index XML data with multi-dimensional access methods and develop efficient algorithms to minimize the query cost. Extensive experimental evaluations confirm that our algorithms provide significant performance gains for numerous query types.
Our next goal is to further improve the path expression performance by considering the query workload. All existing indices for structural join focus on indexing the encoding information of XML elements so as to improve the XML path expression performance. Consequently, such indices utilize only data characteristics, but ignore query characteristics that are important for the further improvement of the query performance. In this thesis, we propose AC-tree (Adaptive Cluster B
+-tree), which is a fully workload-aware structural join index, to provide a simple, but efficient way to exploit the XML query characteristics for performance improvement. To the best of our knowledge, AC-tree is the first structural join index to take the workload into consideration. An extensive set of experiments confirm that AC-tree outperforms competitors significantly for both simple (i.e., non-branching) XML path expressions and branching path expressions.
Post a Comment