THESIS
2022
1 online resource (x, 91 pages) : illustrations (some color)
Abstract
The past few years have seen major advances in machine learning and data analysis due to
the access to substantial amounts of user data. However, the collected personal information
is often quite sensitive. Differential privacy is a rigorous mathematical definition for privacy-preserving
data publishing, which has been widely adopted nowadays.
This thesis mainly focuses on the fundamental problem of range counting under differential
privacy, i.e., computing how many points are lying inside any specific region from a
certain range space, and its applications. The query family consists of geometric ranges of a
certain type, e.g., points, rectangles, spheres, or arbitrarily-shaped regions. First, we study
the range counting problem in the centralized model, where a trusted curator holds th...[
Read more ]
The past few years have seen major advances in machine learning and data analysis due to
the access to substantial amounts of user data. However, the collected personal information
is often quite sensitive. Differential privacy is a rigorous mathematical definition for privacy-preserving
data publishing, which has been widely adopted nowadays.
This thesis mainly focuses on the fundamental problem of range counting under differential
privacy, i.e., computing how many points are lying inside any specific region from a
certain range space, and its applications. The query family consists of geometric ranges of a
certain type, e.g., points, rectangles, spheres, or arbitrarily-shaped regions. First, we study
the range counting problem in the centralized model, where a trusted curator holds the entire
dataset. Existing lower bounds based on discrepancy theory suggest that large errors have
to be introduced in order to preserve privacy: Essentially for any range space (except axis-parallel
rectangles), the error has to be polynomial. We show that by allowing a standard
notion of geometric approximation where points near the boundary of the range may or may
not be counted, the error can be reduced to be logarithmic. Next, we study the mean estimation
problem under differential privacy, where worst-case optimal mechanisms do not offer
meaningful utility guarantees in practice when the global sensitivity is very large. We take
a principled approach, yielding a mechanism that is instance-optimal in a strong sense. The
mechanism is based on private quantile estimation that relies on range counting mechanisms. Finally, we study the frequency estimation problem under both privacy and communication
constraints in the multiparty model, where the (one-shot or streaming) data is distributed
among multiple parties. Our protocols achieve optimality (up to polylogarithmic factors)
permissible by the more stringent of the two constraints.
Post a Comment