THESIS
2018
xv, 94 pages : illustrations ; 30 cm
Abstract
Scale matters. In the era of big data, the unprecedented growth of data scale is fundamentally
transforming the way we make sense of it. With the rapid rise of cloud computing,
applications with massive input datasets are scaling out to thousands of machines to efficiently
exploit I/O parallelism.
As one of the major challenges introduced by these data-parallel applications, communication
among distributed tasks often results in massive data transfers over the network.
To address this problem, we observe continuous efforts in industry to build high-capacity,
low-latency datacenter networking infrastructure at scale. In the meanwhile, we also observe
concentrated efforts in academia to develop efficient network optimization mechanisms
for big data analytics.
However, as a first...[
Read more ]
Scale matters. In the era of big data, the unprecedented growth of data scale is fundamentally
transforming the way we make sense of it. With the rapid rise of cloud computing,
applications with massive input datasets are scaling out to thousands of machines to efficiently
exploit I/O parallelism.
As one of the major challenges introduced by these data-parallel applications, communication
among distributed tasks often results in massive data transfers over the network.
To address this problem, we observe continuous efforts in industry to build high-capacity,
low-latency datacenter networking infrastructure at scale. In the meanwhile, we also observe
concentrated efforts in academia to develop efficient network optimization mechanisms
for big data analytics.
However, as a first-hand experience, we find efficient network optimization profoundly
challenging — especially when performed in a practical manner. First, application-aware
network scheduling using coflows serves as one important technique to improve application-level
communication performance. However, existing coflow-based solutions rely on modifying the underlying computing frameworks to identify coflows, making them inapplicable
to many practical scenarios. Moreover, precise network load balancing is crucial
to ensure network schedules and resolve in-network bottlenecks. Meanwhile, production
datacenters operate under various uncertainties such as traffic dynamics, topology
asymmetry, and failures. These uncertainties make network load balancing challenging
in practice.
Can we perform both efficient and practical network optimization for big data analytics?
This dissertation describes my research efforts to answer this in the affirmative.
First, we propose CODA, a practical application-aware network scheduling framework.
CODA makes the first attempt at automatically identifying and scheduling coflows without
any framework-level modification. It serves as one necessary and natural step towards
practical network optimization for big data applications. Second, we present Hermes, a
resilient load balancing scheme tailored for the dynamic and complex datacenter environment.
Hermes gracefully handles various kinds of uncertainties in a readily-deployable
fashion.
Post a Comment