THESIS
2022
1 online resource (xiv, 139 pages) : illustrations (some color)
Abstract
Recent advancements in artificial intelligence (AI) applications rely on massive amounts of
training data. In practice, these valuable data are independently distributed among multiple data
owners (e.g., companies and individuals), whose quantities are typically modest, and the data are
usually heterogeneous. Collecting data from individual users or acquiring data from data owners is
a conventionally popular and straightforward solution to this issue. However, such solutions have
become obsolete due to the rising trend of data privacy and data security concerns. Currently, AI
systems face the problem of utilizing fragmented and diverse data that are independently distributed
across several data owners.
Federated learning (FL), a novel privacy-preserving collaborative machine learning pa...[
Read more ]
Recent advancements in artificial intelligence (AI) applications rely on massive amounts of
training data. In practice, these valuable data are independently distributed among multiple data
owners (e.g., companies and individuals), whose quantities are typically modest, and the data are
usually heterogeneous. Collecting data from individual users or acquiring data from data owners is
a conventionally popular and straightforward solution to this issue. However, such solutions have
become obsolete due to the rising trend of data privacy and data security concerns. Currently, AI
systems face the problem of utilizing fragmented and diverse data that are independently distributed
across several data owners.
Federated learning (FL), a novel privacy-preserving collaborative machine learning paradigm,
is proposed to address the privately isolated small data learning problem. Its main idea is to compose
a federation of data owners in which all participants virtually assemble their data without
sacrificing data security and privacy. There are several challenges for federated learning, including
communication efficiency, data security and privacy protection, and statistical learning. Among these challenges, the statistical learning challenge caused by heterogeneous data significantly affects
the performance of FL systems and thus prohibits FL’s applications in practice. In recent
years, academics have developed a machine learning paradigm known as transfer learning, which
utilizes heterogeneous data to solve the statistical learning issue in the target domain with limited
or no data. Naturally, it motivates us to incorporate the spirit of transfer learning into federated
learning to overcome the difficulty of statistical learning in practical FL.
In this thesis, we focus on federated transfer learning, a class of federated learning methods
that employ the transfer learning methodology to tackle the statistical learning difficulty posed by
heterogeneous data. Compared to other federated learning approaches, which presume datasets on
data owners are similarly and independently distributed, federated transfer learning focuses on how
to address data heterogeneity across data owners in practice and achieves superior performance.
The thesis consists of two parts. First, we provide a brief overview of federated learning, including
its concept, evolution, and categorization. More specifically, we cover its statistical learning
challenges in depth. We offer a precise categorization of algorithms addressing these challenges
in federated learning, which we refer to as federated transfer learning. Then, we examine current
representative works and incorporate them into our proposed federated transfer learning architecture.
Second, we identify three typical scenarios of data heterogeneity in federated learning with
practical applications and investigate how our proposed federated transfer learning methods overcome
the challenge in these scenarios. We believe that these federated transfer learning methods
hold great promise for wider applications of federated learning.
Post a Comment