THESIS
2017
xv, 109 pages : illustrations ; 30 cm
Abstract
Data-parallel computing frameworks are designed to support the processing of
large volumes in computing clusters for big data analytics, such as search engines,
personalized recommendation, video analytics and graph processing. Due to the
distributed nature of big data analytics, computation and network resources both
serve as the most critical factors to improve individual job performance and overall
system throughput. There is a pressing need to coordinate the allocation of
network bandwidth and the scheduling of computation tasks.
This thesis handles the allocation of both network and computation resources
through delay-aware bandwidth allocation schemes and network-aware task scheduling
frameworks. Specifically, we make the following three contributions.
First, we design T...[
Read more ]
Data-parallel computing frameworks are designed to support the processing of
large volumes in computing clusters for big data analytics, such as search engines,
personalized recommendation, video analytics and graph processing. Due to the
distributed nature of big data analytics, computation and network resources both
serve as the most critical factors to improve individual job performance and overall
system throughput. There is a pressing need to coordinate the allocation of
network bandwidth and the scheduling of computation tasks.
This thesis handles the allocation of both network and computation resources
through delay-aware bandwidth allocation schemes and network-aware task scheduling
frameworks. Specifically, we make the following three contributions.
First, we design Tailor, a dynamic monitoring and routing system to reduce
network transfer times between successive computation stages of a job (captured
as coflow completion time). Tailor is transparent to data-parallel applications
and requires minimum modifications of end-hosts. For clusters where only edge
networks experience severe and persistent congestion, we identify the non-trivial tradeoff between coflow performance and network utilization. Through in-depth
analysis, we show that achieving work conservation is insufficient to maximizing the utilization of access links. We propose a hierarchical bandwidth allocation
framework, Adia, that maximizes link utilization while achieves near-optimal coflow performance.
Secondly, we propose to embrace network-awareness into task scheduling, since
network communication still serves as the determining factor for job performance
even with the state-of-the-art bandwidth allocation schemes. By introducing
a novel network-aware queueing model, we decouple the usage of network and
computation resources and thus accurately capture the total processing time of
each task. We then propose a network-aware scheduling algorithm, Adrestia, and
prove it is throughput-optimal given the demand for network and computation
resources as priori.
Last but not least, we propose an online scheduling framework, Symbiosis, that
identifies resource imbalance and coordinates computation- bound and network-bound
tasks in a large cluster, with the objective of utilizing all types of resources
in a cluster with optimal system throughput. Symbiosis provides both a substrate
and an application programming interface (API) to support existing task schedulers
in data analytics frameworks. With network-awareness, our framework fully
considers network and computation resources, making task scheduling and bandwidth
allocation decisions based on live analytics of cluster states. We have implemented
Symbiosis on top of Spark and demonstrated it improves both delay and
throughput in a real-world cloud testbed using diversified analytic workloads.
Post a Comment