THESIS
2014
xi, 101 pages : illustrations ; 30 cm
Abstract
As cloud and big data computation grows to be an increasingly important paradigm, providing
a general abstraction for datacenter-scale programming has become an imperative research
agenda. Researchers have proposed, designed and implemented various computation
models and systems on different abstraction levels, such as MapReduce, X10, Dryad, Storm
and Spark. However, many abstractions expose the distributed detail of the platform to the
application layer, and lead to increased complexity in programming, decreased performance,
and, sometimes, loss of generality. At the data substrate layer, traditional cloud computing
technologies, such as MapReduce, use disk-based file systems as the system-wide substrate
for data storage and sharing. A distributed file system provides a global...[
Read more ]
As cloud and big data computation grows to be an increasingly important paradigm, providing
a general abstraction for datacenter-scale programming has become an imperative research
agenda. Researchers have proposed, designed and implemented various computation
models and systems on different abstraction levels, such as MapReduce, X10, Dryad, Storm
and Spark. However, many abstractions expose the distributed detail of the platform to the
application layer, and lead to increased complexity in programming, decreased performance,
and, sometimes, loss of generality. At the data substrate layer, traditional cloud computing
technologies, such as MapReduce, use disk-based file systems as the system-wide substrate
for data storage and sharing. A distributed file system provides a global name space and
stores data persistently, but it also introduces significant overhead. Several recent systems
use DRAM to store data and tremendously improve the performance of cloud computing systems.
However, both our own experience and related work indicate that a simple substitution
of distributed DRAM for the file system does not provide a solid and viable foundation for data processing and storage in the datacenter environment, and the capacity of such systems
is limited by the amount of physical memory in the cluster.
To support general, efficient, flexible, and concurrent application workloads with sophisticated
data processing, we present programmers an illusion of a big virtual machine built on
top of one, multiple or many compute nodes and unify the physical memory and disks of the
nodes to form a globally addressable data substrate. We design a new instruction set architecture,
i0, to unify myriads of compute nodes to form a big virtual machine called MAZE
where thousands of tasks run concurrently in VOLUME, a large, unified, and snapshotted
distributed virtual memory. i0, MAZE and VOLUME form the foundation of the Layer Zero
systems which provide a general substrate for cloud computing. i0 provides a simple yet
general and scalable programming model. VOLUME mitigates the scalability bottleneck of
traditional distributed shared memory systems and unifies the physical memory and disks on
many compute nodes to form a distributed transactional virtual memory. VOLUME provides
a general memory-based abstraction, takes advantage of DRAM in the system to accelerate
computation, and, transparently to programmers, scales the system to process and store large
datasets by swapping data to disks and remote servers. Along with the efficient execution
engine of MAZE, the capacity of a MAZE can scale up to support large datasets and large
clusters.
We have implemented the Layer Zero systems on several platforms, and designed and
implemented various benchmarks, graph processing and machine learning programs and
application frameworks. Our evaluation shows that Layer Zero has excellent performance
and scalability. On one physical host, the system overhead is comparable to that of traditional
VMMs. On 16 physical hosts, Layer Zero runs 10 times faster than Hadoop and X10. On
160 physical compute servers, Layer Zero scales linearly on a typical iterative workload.
Post a Comment