THESIS
2009
x, 41 p. : ill. ; 30 cm
Abstract
XML documents have a tree-like hierarchical structure design and XML has become popular as the standard data format for exchanging information over the Internet. Recent research has explored issues on querying different types of metadata associated with an XML document, including annotations, quality assessments, security policies, etc. The metadata considered by many of these approaches can be modeled as values from a totally ordered domain (e.g. recency using timestamps). A natural model for annotating XML data with metadata is to associate metadata explicitly only with selected element nodes in the XML data tree. For elements where the metadata is not explicitly specified, it is inherited from the nearest ancestor where it is explicitly specified. We show that such a metadata model c...[
Read more ]
XML documents have a tree-like hierarchical structure design and XML has become popular as the standard data format for exchanging information over the Internet. Recent research has explored issues on querying different types of metadata associated with an XML document, including annotations, quality assessments, security policies, etc. The metadata considered by many of these approaches can be modeled as values from a totally ordered domain (e.g. recency using timestamps). A natural model for annotating XML data with metadata is to associate metadata explicitly only with selected element nodes in the XML data tree. For elements where the metadata is not explicitly specified, it is inherited from the nearest ancestor where it is explicitly specified. We show that such a metadata model can be converted to a 3-sided 2D range query problem. Thus, we present an external binary priority search tree which is a data structure for querying XML with inherited attributes I/O-efficiently. We can build our structure in O(N/B log
2N) I/Os, answer a query in O(log
2N + T / B) I/Os and return an update in O(log
2N) I/Os, where B is the disk block size, N is the size of the XML document, and T is the size of the query output. We have also demonstrated the practical efficiency of our index structure with experiments on benchmark XML data.
Post a Comment