THESIS
2022
1 online resource (xi, 118 pages) : illustrations (some color)
Abstract
Synopses, also known as summaries or sketches, are data structures computed from
a dataset to support analytical queries. Traditionally, the purpose of using synopses is
efficiency. By allocating some extra space, queries can be answered approximately but
efficiently by a precomputed synopsis without referencing the original dataset. Recently,
privacy has been a new concern besides efficiency. As the current standard for privacy
analysis, differential privacy (DP) enjoys the property that post-processing a DP mechanism
brings no extra privacy loss. This is in line with the principle of synopses: by
constructing a differentially private synopsis from the dataset, many queries can be answered
accurately while satisfying a predefined privacy constraint at construction time.
In this thesis,...[
Read more ]
Synopses, also known as summaries or sketches, are data structures computed from
a dataset to support analytical queries. Traditionally, the purpose of using synopses is
efficiency. By allocating some extra space, queries can be answered approximately but
efficiently by a precomputed synopsis without referencing the original dataset. Recently,
privacy has been a new concern besides efficiency. As the current standard for privacy
analysis, differential privacy (DP) enjoys the property that post-processing a DP mechanism
brings no extra privacy loss. This is in line with the principle of synopses: by
constructing a differentially private synopsis from the dataset, many queries can be answered
accurately while satisfying a predefined privacy constraint at construction time.
In this thesis, we focus on three problems related to synopses construction. We start
with the cardinality estimation problem for Select-Project-Join queries under the nonprivate
setting. We propose a synopsis constructed through weighted distinct sampling
according to near-optimal sampling rates, and show an efficient way of constructing the
it. We demonstrate its optimality through both theoretical and empirical evaluation.
We then consider numerical queries on static datasets under differential privacy, which
capture Select-Aggregate queries in database systems. A private synopsis is designed to
achieve both query-specific and instance-specific error, which outperforms existing solutions
both in theory and in practice. Finally, We extend the setting to streaming data. A
private synopsis is designed for linear queries, which are generalizations of Select queries in
databases. We show our synopsis for fully-dynamic streams has asymptotically the same
error as answering queries in static settings, ignoring polylogarithmic factors in the length
of the stream. The results can also be extended to arbitrary union-preserving queries.
Post a Comment