THESIS
2025
1 online resource (xviii, 197 pages) : illustrations (chiefly color)
Abstract
As artificial intelligence continues to permeate various domains, the challenges of data scarcity and privacy preservation become increasingly significant. Federated Learning (FL) provides a collaborative model training framework across organizations while protecting data privacy. Specifically, Vertical Federated Learning (VFL) tackles unique challenges arising from vertically partitioned data among participants. However, VFL encounters heightened privacy risks and inefficiencies stemming from prevalent data and model information exposures. This thesis proposes a minimum-exposure approach for trustworthy VFL, strategically identifying and exposing only the minimum-necessary information, thereby optimizing the trade-offs among multiple objectives of trustworthiness, including privacy, ut...[
Read more ]
As artificial intelligence continues to permeate various domains, the challenges of data scarcity and privacy preservation become increasingly significant. Federated Learning (FL) provides a collaborative model training framework across organizations while protecting data privacy. Specifically, Vertical Federated Learning (VFL) tackles unique challenges arising from vertically partitioned data among participants. However, VFL encounters heightened privacy risks and inefficiencies stemming from prevalent data and model information exposures. This thesis proposes a minimum-exposure approach for trustworthy VFL, strategically identifying and exposing only the minimum-necessary information, thereby optimizing the trade-offs among multiple objectives of trustworthiness, including privacy, utility, robustness, and efficiency. This thesis categorizes information exposure into data exposure and model parameter exposure. First, we address intra-sample label exposure in VFL with a two-phase framework: offline-phase cleansing and training-phase perturbation. Our proposed Label Privacy Source Coding (LPSC) encodes the minimum-necessary label information in the offline phase. Then, we employ adversarial training to enhance privacy during training. Second, we further explore a more challenging VFL scenario with arbitrarily-aligned samples across parties. To tackle this challenge, we introduce the Complementary Knowledge Distillation (CKD) framework, which facilitates privacy-preserving knowledge transfer among passive parties by minimizing intra-sample information exposure. Third, we address inter-sample information exposure by proposing a secure vertical federated dataset condensation (VFDC) framework. This framework efficiently condenses the entire real dataset in VFL to a small synthetic dataset, reducing inter-sample information exposure that could compromise privacy while maintaining model utility. Finally, we tackle model parameter exposure in heterogeneous federated transfer learning with a privacy-preserving framework, PP-HFTL, to securely transfer knowledge using cryptographic methods. PP-HFTL proposes a model integration method to reduce model parameter exposure and allow local model inference, thereby eliminating the need for secure cross-party inference. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of our approaches, outperforming existing baselines in various objectives.
Post a Comment