THESIS
2022
1 online resource (xii, 120 pages) : illustrations (some color)
Abstract
Conjunctive queries form the backbone of SQL queries, and have been extensively studied
in the literature. In this thesis, we consider two important extensions of a basic conjunctive
query. First, we study how to evaluate conjunctive queries with predicates in the form of
comparisons that span multiple relations. Such predicates have regained interest, due to their
relevance in OLAP systems, spatiotemporal databases, and machine learning over relational
data. We design a new algorithm for evaluating conjunctive queries with comparisons and
identify an acyclic condition under which linear query processing time can be achieved. We
also extend our acyclic query algorithm to more general queries, resulting in polynomial-factor
improvements over previous best results for many cyclic queries....[
Read more ]
Conjunctive queries form the backbone of SQL queries, and have been extensively studied
in the literature. In this thesis, we consider two important extensions of a basic conjunctive
query. First, we study how to evaluate conjunctive queries with predicates in the form of
comparisons that span multiple relations. Such predicates have regained interest, due to their
relevance in OLAP systems, spatiotemporal databases, and machine learning over relational
data. We design a new algorithm for evaluating conjunctive queries with comparisons and
identify an acyclic condition under which linear query processing time can be achieved. We
also extend our acyclic query algorithm to more general queries, resulting in polynomial-factor
improvements over previous best results for many cyclic queries.
Secondly, we study the problem of incrementally maintaining the results of a conjunctive
query under updates. Prior work has shown that this problem is inherently hard in the
worst case. To provide an instance-specific analysis of the update cost, we introduce a new
measure of complexity of the update sequence, which we call the enclosureness, which is a
small constant for many realistic update sequences, such as first-in-first-out (FIFO) sequence.
We present an algorithm to maintain the query results of any acyclic foreign-key join query
in time linear in the enclosureness.
We also extend the results on acyclic foreign-key joins to non-key joins. By revisiting the classical change propagation framework, we propose a new change propagation framework
without using joins, thus naturally avoiding the polynomial blowup caused by joins. Meanwhile,
we show that the new framework still supports constant-delay enumeration of both
the deltas and the full query results, the same as in the standard framework. Furthermore,
we extend the concept of enclosureness and provide a quantitative analysis of its update cost,
which not only recovers many recent theoretical results on the problem, but also yields an
effective approach to optimizing the query plan.
All newly proposed algorithms are efficient not only in theory but also in practice. We
have implemented them on Spark and Flink, and our experimental results demonstrate order of-magnitude speedups against the previous state-of-the-art results.
Post a Comment