THESIS
2008
x, 101 p. : ill. ; 30 cm
Abstract
Due to its ease of use, relational keyword search (R-KWS) has become increasingly popular. Compared to SQL, it has several benefits. First, R-KWS hides the schema from users and allows searching for combinations of terms without knowing in which data sources they appear. Second, its queries are easy to express. Third, many search tasks only become feasible through R-KWS. However, R-KWS is a young concept that holds many challenges. First, it lacks common semantics, and different implementations encounter various pitfalls. Second, its simplicity comes at a high computational cost. Specifically, R-KWS explores a vast search space, composed of all possible combinations of keyword occurrences in any attribute of every table. There are two general methodologies for query processing: (i) grap...[
Read more ]
Due to its ease of use, relational keyword search (R-KWS) has become increasingly popular. Compared to SQL, it has several benefits. First, R-KWS hides the schema from users and allows searching for combinations of terms without knowing in which data sources they appear. Second, its queries are easy to express. Third, many search tasks only become feasible through R-KWS. However, R-KWS is a young concept that holds many challenges. First, it lacks common semantics, and different implementations encounter various pitfalls. Second, its simplicity comes at a high computational cost. Specifically, R-KWS explores a vast search space, composed of all possible combinations of keyword occurrences in any attribute of every table. There are two general methodologies for query processing: (i) graph based (GB), traversing a materialized data graph, and (ii) operator based (OB), executing a set of relational operator trees. In both cases, a large portion of the computations are wasted on traversals/operators that fail to return results. Finally, while data streams become increasingly important to our information landscapes, R-KWS has been restrained to static data. The contribution of this paper is thus threefold. First, it provides homogenized semantics and matching query processing methods (GB as well as OB) that are formally shown to be correct. Second, it introduces a comprehensive framework for reachability indexing that quickly collapses the search space, and leads to significant savings. Third, it extends R-KWS to continuous queries over relational data streams. In particular, it proposes temporal semantics, as well as matching query processing techniques (GB as well as OB) that are further optimized. Extensive experiments demonstrate the high practicability of R-KWS, on tables as well as streams, and the significant benefits of the proposed optimizations.
Post a Comment