Differentially private learning with streaming and distributed data

HKUST Electronic Theses

Differentially private learning with streaming and distributed data

by Zhicong Liang

THESIS 2022

Ph.D. Mathematics

1 online resource (xviii, 149 pages) : illustrations (some color)

Abstract

Differential privacy (DP) recently arises as an important topic in machine learning in demand for protecting sensitive personal data. Yet, there are various open problems in the scenarios where data is collected in online and/or heterogeneously distributed ways. In this thesis, we address some open problems in the two scenarios above.

In the first part of this thesis, we study differentially private stochastic convex optimization (DP-SCO) with i.i.d. streaming data and continual release requirement, which is known as the online setting. Despite that numerous algorithms have been developed to achieve the optimal excess risks in different non-Euclidean l_p, (1 ≤ p ≤ ∞) norm geometries, yet none of the existing ones can be adapted to the streaming and continual release setting. To address such a challenge, we propose a private variant of a common framework, the online Frank-Wolfe algorithm with recursive gradients. Combined with the adaptive differential privacy analysis, our online algorithm achieves in linear time the optimal excess risk when 1 < p ≤ 2 and the state-of-the-art excess risk meeting the non-private lower ones when 2 < p < ∞. Our algorithm can also be extended to the case p = 1 to achieve nearly dimension-independent excess risk.

In the second part of this thesis, we consider differentially private federated learning (DP-Fed) where multiple data owners, referred to as clients, can cooperatively learn a useful model without disclosing their sensitive data. A key new observation is that the federated average of gradients are often smooth or sparse in Fourier basis with polynomial decays empirically, based on which we apply Laplacian Smoothing (DP-Fed-LS) to reduce variance with improved estimates of such gradients. The utility improvements of our algorithm with i.i.d. and heterogeneous distributed data are shown with comprehensive numerical experiments. In particular, under heterogeneous data distributions, we provide convergence bounds of tight rates and communication complexity, matching those on federated learning without differential privacy, or the ones of empirical risk minimization (ERM) via SGD with differential privacy in a centralized setting.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Mathematics Supervisors Yao, Yuan Authors Liang, Zhicong Subjects Machine learning Data processing Mathematical models Electronic data processing Distributed processing Stochastic processes Differential equations Language English Call number Thesis MATH 2022 Liang DOI 10.14711/thesis-991013142457203412

Full record

Differentially private learning with streaming and distributed data

by Zhicong Liang

Post a Comment Cancel reply