THESIS
2023
1 online resource (xi, 135 pages) : illustrations (some color)
Abstract
Differential privacy (DP) has garnered significant attention from both academia and
industry due to its potential in offering robust privacy protection for individual data
during analysis. With the increasing volume of sensitive information being collected by
organizations and analyzed through SQL queries, the development of a general-purpose
query engine that is capable of supporting a broad range of SQLs while maintaining
differential privacy has become the holy grail in privacy-preserving query release. Over
the past several years, the database and security communities have made numerous efforts
to achieve this objective. However, two major challenges still exist:
Challenge 1: Privacy guarantee in relational databases Informally speaking, DP
requires indistinguishability of the query...[
Read more ]
Differential privacy (DP) has garnered significant attention from both academia and
industry due to its potential in offering robust privacy protection for individual data
during analysis. With the increasing volume of sensitive information being collected by
organizations and analyzed through SQL queries, the development of a general-purpose
query engine that is capable of supporting a broad range of SQLs while maintaining
differential privacy has become the holy grail in privacy-preserving query release. Over
the past several years, the database and security communities have made numerous efforts
to achieve this objective. However, two major challenges still exist:
Challenge 1: Privacy guarantee in relational databases Informally speaking, DP
requires indistinguishability of the query results whether any particular individual’s data
is included or not in the database. While this can be easily defined and well studied over
a single flat table, the situation becomes more complex in a relational database with the
presence of multiple relations, foreign keys, and the join operator.
Challenge 2: Optimal privacy-utility trade-off Noise injection is inherently necessary
for privacy protection, but the central scientific question is how to achieve the lowest
error (i.e., highest utility) under a given privacy budget. Unfortunately, for the problem
of relational query evaluation under DP, traditional notions of optimality, i.e., instance
optimality and worst-case optimality, are either unattainable or meaningless. Thus, a new
notion of optimality is needed for answering this question.
In this thesis, we aim at tackling these challenges. To model the complex situation
of relational databases, we study two different DP policies: tuple-DP, which protects the
privacy of single tuples in each relation, and user-DP, which protects all data belonging
to each user via foreign keys. Under each policy, we have designed DP mechanisms for
answering a broad class of queries consisting of the selection, projection, aggregation, and
join operators. To formalize their optimality, we introduce the notion of neighborhood
optimality, which sits between instance optimality and worst-case optimality. Finally,
based on the algorithms and theory developed, we have built a DP-SQL system that
significantly outperforms existing systems in terms of both utility and efficiency.
Post a Comment