THESIS
2023
1 online resource (xiii, 59 pages) : illustrations (chiefly color)
Abstract
In recent years, public concern over privacy security has led to an increasing emphasis on
privacy preservation. Cross-silo Federated Learning (FL) has been applied in both academia
and industry to connect data holders who are isolated by laws and regulations. However,
to guarantee data security, considerable overhead is introduced by the security protocols in
FL because of the high calculation complexity and large data size, preventing FL from being efficient in real-life applications.
This thesis proposes hardware acceleration designs that target the high-performance computation
of nine widely used cryptographic operations in FL, including homomorphic encryption
and RSA algorithm. Compared to traditional CPU approaches, our hardware designs
leverage the abundant computation and storag...[
Read more ]
In recent years, public concern over privacy security has led to an increasing emphasis on
privacy preservation. Cross-silo Federated Learning (FL) has been applied in both academia
and industry to connect data holders who are isolated by laws and regulations. However,
to guarantee data security, considerable overhead is introduced by the security protocols in
FL because of the high calculation complexity and large data size, preventing FL from being efficient in real-life applications.
This thesis proposes hardware acceleration designs that target the high-performance computation
of nine widely used cryptographic operations in FL, including homomorphic encryption
and RSA algorithm. Compared to traditional CPU approaches, our hardware designs
leverage the abundant computation and storage resources on hardware devices such as GPU,
FPGA, and ASIC. Our solution consists of two parts. First, we present the GPU-based acceleration
design, HAFLO, for federated logistic regression. This design enables mainstream FL
frameworks to better utilize GPU devices. Second, we propose FLASH, a specially designed
hardware acceleration architecture for FL. FLASH accelerates cryptographic operations with
fully pipelined computation engines and the data flow scheduling module. We implement FLASH on the VU13P FPGA for prototyping and conduct performance assessment for the
ASIC design of FLASH. Our evaluation results show that the FPGA prototype achieves 6.8x and 2.0x speedups for FL applications over CPU and GPU, respectively. The ASIC design
of FLASH further achieves 23.6x acceleration over the FPGA prototype.
Post a Comment