THESIS
2020
1 online resource (ix, 37 pages) : illustrations (some color)
Abstract
Federated learning (FL) has emerged as an elegant privacy-preserving distributed machine
learning (ML) paradigm. Particularly, vertical FL (VFL) has a promising application
prospect for collaborating organizations owning data of the same set of users but with
disjoint features to jointly train models without leaking their private data to each other.
As the volume of training data and the model size increase rapidly, each organization may
deploy a cluster of many servers to participant in the federation. As such, the intra-party
communication cost (i.e., network transfers within each organization's cluster) can significantly
impact the entire VFL job's performance. Despite this, existing FL frameworks
use the inefficient gRPC for intra-party communication, leading to high latency and hig...[
Read more ]
Federated learning (FL) has emerged as an elegant privacy-preserving distributed machine
learning (ML) paradigm. Particularly, vertical FL (VFL) has a promising application
prospect for collaborating organizations owning data of the same set of users but with
disjoint features to jointly train models without leaking their private data to each other.
As the volume of training data and the model size increase rapidly, each organization may
deploy a cluster of many servers to participant in the federation. As such, the intra-party
communication cost (i.e., network transfers within each organization's cluster) can significantly
impact the entire VFL job's performance. Despite this, existing FL frameworks
use the inefficient gRPC for intra-party communication, leading to high latency and high
CPU cost. In this paper, we propose a design to transmit data with RDMA for intra-party
communication, with no modifications to applications. To improve the network efficiency,
we further propose an RDMA usage arbiter to adjust the RDMA bandwidth used for a
non-straggler party dynamically, and a query data size optimizer to automatically find
out the optimal query data size that each response carries. Our preliminary results show
that RDMA based intra-party communication is 10x faster than gRPC based one, leading
to a reduction of 9% on the completion time of a VFL job. Moreover, the RDMA usage
arbiter can save over 90% bandwidth, and the query data size optimizer can improve the
transmission speed by 18%.
Post a Comment