THESIS
2015
xv, 225 pages : illustrations ; 30 cm
Abstract
In this study, multiple cost saving options for computing clusters specifically
designed for big-data analytic systems are investigated. The ultimate goal is
to allow the computing clusters to provide more economical software hosting
services for big-data analytic applications through hardware multiplexing, economics
of scale, data sharing and so on. We are particularly interested in the
research problems in the areas of virtual machine (VM) workload consolidation,
job scheduling in big-data analytic framework and data deduplication in
data storage systems. In particular, we seek to answer the following questions:
(1) How should we group and assign virtual machines in order to
minimize the cost of the data center? (2) How should we schedule
jobs in a big-data analytic system a...[
Read more ]
In this study, multiple cost saving options for computing clusters specifically
designed for big-data analytic systems are investigated. The ultimate goal is
to allow the computing clusters to provide more economical software hosting
services for big-data analytic applications through hardware multiplexing, economics
of scale, data sharing and so on. We are particularly interested in the
research problems in the areas of virtual machine (VM) workload consolidation,
job scheduling in big-data analytic framework and data deduplication in
data storage systems. In particular, we seek to answer the following questions:
(1) How should we group and assign virtual machines in order to
minimize the cost of the data center? (2) How should we schedule
jobs in a big-data analytic system according to their time budget?
(3) How should files be distributed and stored in multiple servers
in order to eliminate data redundancy maximally? By combining the
answers to the above questions, we aim to produce advanced management systems
that help the big-data application users reduce their overall operating
cost without jeopardizing the quality of service of their applications.
To produce insightful deigns, we mainly rely on advanced discrete optimization
techniques and graph theoretic techniques to acquire intelligent decisions
with system dynamics in consideration. For example, Lagrangian relaxation
and M-convex optimization techniques are applied to solve the VM workload
consolidation problem, while total unimodularity and robust optimization
techniques are applied to perform job scheduling for big-data analytic systems in a real-time manner. The design principle that we value heavily is that the
proposed solutions must consist of light-weight distributed algorithms that can
be implemented for the real-world systems. Our proposals are derived with
non-trivial theoretical foundations.
To evaluate the practicality of the proposed systems, we built prototypes
upon representative real-world platforms. For performance evaluations that
require a large-scale data center, simulation programs are developed. Our
results suggest that our proposed systems are efficient, agile and robust. We
believe that after fine tuning, the prototype systems can be powerful tools for
existing big-data analytic softwares.
Post a Comment