THESIS
2018
xv, 99 pages : illustrations ; 30 cm
Abstract
Fair allocation of network resources for data-parallel applications is a challenging undertaking.
Conflicts between the ever-increasing traffic volumes and limited link bandwidth are
becoming growingly intense. Besides, the distributed nature of data-parallel applications exhibits
a unique correlated traffic pattern where a job is considered completed only when the
coflow—flows of all the constituent tasks—has finished. In face of the challenges, this thesis
presents a systematic study to ensure the progress of network communications confronting
data-parallel applications.
Our first insight is that, data locality should be fully exploited to reduce network transfers,
thus alleviating link contention and accelerating application progress. We propose Custody,
a cluster management...[
Read more ]
Fair allocation of network resources for data-parallel applications is a challenging undertaking.
Conflicts between the ever-increasing traffic volumes and limited link bandwidth are
becoming growingly intense. Besides, the distributed nature of data-parallel applications exhibits
a unique correlated traffic pattern where a job is considered completed only when the
coflow—flows of all the constituent tasks—has finished. In face of the challenges, this thesis
presents a systematic study to ensure the progress of network communications confronting
data-parallel applications.
Our first insight is that, data locality should be fully exploited to reduce network transfers,
thus alleviating link contention and accelerating application progress. We propose Custody,
a cluster management framework that non-intrusively retrieves locality information on input
data blocks and assigns machines with local data to applications in a fair fashion by solving
the data-aware resource sharing problem.
Even with data locality in hand, however, network transfers are still inevitable and oftentimes
enormous. Therefore, network isolation should be provided so that the worst case
performance of each service is assured. We propose our solution Libra that maximizes network isolation guarantee by adjusting the placement of task containers.
Our next endeavor is to navigate the fairness and efficiency trade-off for data-parallel
applications. Fairness ensures the progress of each application, but on the other hand, it
impedes the overall performance such as average coflow completion time (CCT). To bridge
the gap, we design a new coflow scheduler Coflex that exposes a tunable fairness knob to
adjust the isolation guarantee, while at the same time decreasing the average CCT with the
remaining bandwidth.
Finally, since more and more applications are required to finish within deadlines, we shift
our focus towards deadline-aware scheduling. We present Chronos, a scheduling framework
that captures the deadline-aware semantics and allocates network resources among multiple
concurrent coflow applications.
Post a Comment