THESIS
2022
1 online resource (xviii, 149 pages) : illustrations (some color)
Abstract
Differential privacy (DP) recently arises as an important topic in machine learning in demand for protecting sensitive personal data. Yet, there are various open problems in the scenarios where data is collected in online and/or heterogeneously distributed ways. In this thesis, we address some open problems in the two scenarios above.
In the first part of this thesis, we study differentially private stochastic convex optimization (DP-SCO) with i.i.d. streaming data and continual release requirement, which is known as the online setting. Despite that numerous algorithms have been developed to achieve the optimal excess risks in different non-Euclidean l
p, (1 ≤ p ≤ ∞) norm geometries, yet none of the existing ones can be adapted to the streaming and continual release setting. To address...[
Read more ]
Differential privacy (DP) recently arises as an important topic in machine learning in demand for protecting sensitive personal data. Yet, there are various open problems in the scenarios where data is collected in online and/or heterogeneously distributed ways. In this thesis, we address some open problems in the two scenarios above.
In the first part of this thesis, we study differentially private stochastic convex optimization (DP-SCO) with i.i.d. streaming data and continual release requirement, which is known as the online setting. Despite that numerous algorithms have been developed to achieve the optimal excess risks in different non-Euclidean l
p, (1 ≤ p ≤ ∞) norm geometries, yet none of the existing ones can be adapted to the streaming and continual release setting. To address such a challenge, we propose a private variant of a common framework, the online Frank-Wolfe algorithm with recursive gradients. Combined with the adaptive differential privacy analysis, our online algorithm achieves in linear time the optimal excess risk when 1 < p ≤ 2 and the state-of-the-art excess risk meeting the non-private lower ones when 2 < p < ∞. Our algorithm can also be extended to the case p = 1 to achieve nearly dimension-independent excess risk.
In the second part of this thesis, we consider differentially private federated learning (DP-Fed) where multiple data owners, referred to as clients, can cooperatively learn a useful model without disclosing their sensitive data. A key new observation is that the federated average of gradients are often smooth or sparse in Fourier basis with polynomial decays empirically, based on which we apply Laplacian Smoothing (DP-Fed-LS) to reduce variance with improved estimates of such gradients. The utility improvements of our algorithm with i.i.d. and heterogeneous distributed data are shown with comprehensive numerical experiments. In particular, under heterogeneous data distributions, we provide convergence bounds of tight rates and communication complexity, matching those on federated learning without differential privacy, or the ones of empirical risk minimization (ERM) via SGD with differential privacy in a centralized setting.
Post a Comment