THESIS
2024
1 online resource (xvii, 142 pages) : illustrations (some color)
Abstract
Over the past decade, there has been a shift in machine learning from cloud data centers to edge devices. To protect the privacy of raw data, many large companies have adopted federated learning (FL) for tasks such as computer vision and natural language processing across client devices. Despite its embedded principle of data minimization, FL still puts clients' privacy at risk due to the loopholes in its commonly used protocols including secure aggregation and distributed differential privacy (DP). Moreover, current FL systems also suffer from sub-optimal training efficiency, primarily due to the heterogeneity of hardware and data among clients, which are further exacerbated by the use of the aforementioned protocols. This dissertation aims to enhance the privacy and efficiency of FL...[
Read more ]
Over the past decade, there has been a shift in machine learning from cloud data centers to edge devices. To protect the privacy of raw data, many large companies have adopted federated learning (FL) for tasks such as computer vision and natural language processing across client devices. Despite its embedded principle of data minimization, FL still puts clients' privacy at risk due to the loopholes in its commonly used protocols including secure aggregation and distributed differential privacy (DP). Moreover, current FL systems also suffer from sub-optimal training efficiency, primarily due to the heterogeneity of hardware and data among clients, which are further exacerbated by the use of the aforementioned protocols. This dissertation aims to enhance the privacy and efficiency of FL by tackling the above fundamental challenges.
First, we improve the training efficiency in the presence of client heterogeneity. We present Pisces, an asynchronous training system that sidesteps the tricky tradeoff between prioritizing fast clients and prioritizing clients with high-quality data. It also effectively mitigates stale computation, leading to a notable speedup in the overall training.
Second, we solve the privacy and efficiency problems related to model aggregation with distributed DP. We introduce Dordis to precisely enforce the necessary level of random noise in the model, even in the presence of client dropout, thus safeguarding clients' privacy. Dordis also runs as a pipeline-parallel system, efficiently concealing the computational and communication costs that arise from using cryptographic primitives.
Third, we focus on the privacy issue faced by secure aggregation and distributed DP in the presence of a malicious server colluding with compromised clients. We devise Lotto, a security framework that effectively prevents the server from manipulating the selection process for attacking the aforementioned protocols. Additionally, Lotto boasts a lightweight design which minimally affects training efficiency.
Post a Comment