THESIS
2022
1 online resource (xv, 130 pages) : illustration (some color)
Abstract
Witnessing the soaring demand for computation over the past decade, tech companies
are piling up numerous commodity machines to serve requests from massive users. Such
large-scale multi-tenant clusters, with optimized resource scheduling, have the potential
to be highly efficient. However, it is challenging to achieve high performance and low
cost in practice. Given heterogeneous hardware and diverse workloads, many schedulers
either fail with low resource utilization, which increases the cost, or cause high workload
contention, which decreases the performance.
In this dissertation, starting with a characterization study of a production cluster, we
present the challenges posed to resource scheduling; for example, low resource utilization,
presence of hard-to-schedule tasks demanding hig...[
Read more ]
Witnessing the soaring demand for computation over the past decade, tech companies
are piling up numerous commodity machines to serve requests from massive users. Such
large-scale multi-tenant clusters, with optimized resource scheduling, have the potential
to be highly efficient. However, it is challenging to achieve high performance and low
cost in practice. Given heterogeneous hardware and diverse workloads, many schedulers
either fail with low resource utilization, which increases the cost, or cause high workload
contention, which decreases the performance.
In this dissertation, starting with a characterization study of a production cluster, we
present the challenges posed to resource scheduling; for example, low resource utilization,
presence of hard-to-schedule tasks demanding high-end GPUs, imbalance load across machines,
and severe contention on CPU resources. To tackle these issues, packing and balancing
are two major approaches. Bin-packing consolidates workloads on fewer servers,
accommodating demanding tasks and improving resource utilization. Load-balancing
scatters tasks over the cluster, mitigating contention and boosting workload performance.
Following the packing method towards higher utilization, we find resource fragmentation
to be a major obstacle, especially in GPU-sharing clusters where conventional bin-packing
is unviable. It is because the scheduling of GPU-sharing tasks that requests a
partial GPU cannot be modeled as a classic bin packing problem, due to the discrete and
interchangeable nature of GPU resources. Therefore, we take a new approach towards
high utilization by minimizing fragmentation. We quantify the degree of GPU fragmentation
statistically, and then use this metric to guide scheduling. We propose a novel
scheduling heuristic called Fragmentation Gradient Descent (FGD), which consistently outperforms
a variety of packing-based schedulers and further utilizes hundreds of GPUs in
large-scale cluster emulations driven by production traces.
Following the balancing method towards better performance, we study the placement
of long-running application (LRA) containers. LRAs, with stringent performance requirements,
are difficult to schedule due to their sophisticated resource interferences and I/O
dependencies. Existing schedulers, avoiding contention by minimizing the violations of
placement constraints, fall short in performance as manually expressed constraints only
provide qualitative scheduling guidelines. Consequently, we design Metis, a data-driven
scheduling system that learns to optimally place LRA containers using deep reinforcement
learning (RL) techniques. Metis eliminates the complex manual specification of
placement constraints and offers concrete quantitative scheduling criteria. Enhanced by
hierarchical learning techniques, Metis scales to large clusters and substantially increases
the throughput of workloads in real deployments on the public cloud.
Post a Comment