Observable and economical dataflow computation in datacenters

HKUST Electronic Theses

Observable and economical dataflow computation in datacenters

by Huangshi Tian

THESIS 2022

Ph.D. Computer Science and Engineering

1 online resource (xii, 117 pages) : illustrations (some color)

Abstract

With the proliferation of data emerges a myriad of dataflow frameworks. When they are deployed in a datacenter and productized as a service, their performance and cost become two primary concerns. However, performance issues prevail in dataflow computation. Their diagnosis is complicated by the heterogeneity of dataflow frameworks because the frameworks differ in underlying design, application domain, and computation complexity. It poses challenges for service providers and users to debug and locate the problems. A side effect of performance issues is higher resource costs as the datacenter operator cannot easily determine the appropriate allocation that could guarantee stable performance, thus leading to unwanted resource waste.

To tackle the challenges of performance and cost, the dissertation first characterizes dataflow computation in a large datacenter by analyzing a recently released workload trace. It examines the static properties of job DAGs and the runtime characteristics of their task execution. Statically, the DAGs are discovered to exhibit high artificiality when compared with random graphs. The dependent tasks may have significant variability in resource usage and duration—–even for recurring tasks. The results confirm the challenge of performance debugging and resource allocation.

To diagnose performance issues, the dissertation enables resource observability in dataflow computation by proposing CrystalPerf, a new approach that learns to characterize the performance of dataflow computation based on code analysis. It requires no code instrumentation and applies to a wide variety of dataflow frameworks. Our key insight is that the source code of an operation contains learnable syntactic and semantic patterns that reveal how it uses resources. Our approach establishes a performance-resource model that, given a dataflow program, infers automatically how much time each operation has spent on each resource (e.g., CPU, network, disk) from past execution traces and the program source code, using machine learning techniques. Extensive evaluations and real-world case studies show that CrystalPerfcan predict job performance and accurately detect runtime bottlenecks of DAG jobs.

To reduce resource costs, the dissertation proposed Owl, an overcommitted scheduler for executing dataflow computation on serverless platforms. It achieves high utilization without compromising performance with a dual approach. (1) For less-invoked functions, it allocates resources to the sandboxes with usage-based heuristic, keeps monitoring their performance, and remedies any detected degradation. (2) For frequently-invoked functions, Owl profiles the interference patterns among collocated functions and places the sandboxes under the guidance of profiles. Owl further consolidates idle sandboxes to reduce resource waste. We prototype OWL in our production system and implement a representative benchmark suite to evaluate it. The results demonstrate that the prototype could reduce VM cost by 43.80% and effectively mitigate latency degradation, with negligible overhead incurred.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Wang, Wei Authors Tian, Huangshi Subjects Data flow computing Mathematical models Data centers Data processing Language English Call number Thesis CSE 2022 Tian DOI 10.14711/thesis-991013114658403412

Full record

Observable and economical dataflow computation in datacenters

by Huangshi Tian

Post a Comment Cancel reply