THESIS
2022
1 online resource (xv, 73 pages) : illustrations (some color)
Abstract
In real-world datacenter networking, high environmental variations exist. For instance, the
base RTT, which is assumed to be stable, can have up to 2.68× variations due to the varying
processing delay caused by network components such as networking stack, middlebox,
hypervisor,etc., Furthermore, besides the RTT variations, there are also other environmental
variations in datacenters, e.g., traffic pattern, topology, failure, etc., posing challenges
towards transports design for datacenter networks.
From the algorithm level, the high environmental variations make heuristic ECN-based
transports difficult to deliver optimal performance. One concrete example is that the RTT
variations make it difficult for datacenter operators to derive the proper ECN marking threshold
to simultaneously del...[
Read more ]
In real-world datacenter networking, high environmental variations exist. For instance, the
base RTT, which is assumed to be stable, can have up to 2.68× variations due to the varying
processing delay caused by network components such as networking stack, middlebox,
hypervisor,etc., Furthermore, besides the RTT variations, there are also other environmental
variations in datacenters, e.g., traffic pattern, topology, failure, etc., posing challenges
towards transports design for datacenter networks.
From the algorithm level, the high environmental variations make heuristic ECN-based
transports difficult to deliver optimal performance. One concrete example is that the RTT
variations make it difficult for datacenter operators to derive the proper ECN marking threshold
to simultaneously deliver high throughput, low latency and good burst tolerance communications.
Furthermore, we find that adaptive neural network (NN) driven transports
can learn and adapt to the varying environment, which shows its potential to be successful
in datacenter networking with high environmental variations. However, current NN-based
transports fail to deliver optimal performance from the deployment level, leading to either
performance loss or large overhead.
This thesis describes our research efforts in designing efficient transports for datacenter
networking with high environmental variations. First, to solve the problem of degraded performance
with high RTT variations, we propose a new heuristic ECN-based transport — ECN
♯, ECN
♯ extends the current ECN marking mechanism to consider both instantaneous
and persistent congestion. Our evaluations show that ECN
♯ can effectively reduce latency
without hurting throughput. For example, compared to the current practice, ECN
♯ achieves
up to 23.4% (31.2%) lower average (99th percentile) flow completion time (FCT) for short
flows while delivering similar FCT for large flows under production workloads. Second, to
make adaptive NN-based transports available for datacenter networking, we propose LiteFlow. LiteFlow is a hybrid framework to deploy high-performance adaptive NNs for kernel
datapath by decoupling the control path of adaptive NNs into a kernel-space fast path for
efficient model inference, and a userspace slow path for effective model tuning. We evaluate
LiteFlow with two real-world NN-based congestion control (CC) schemes. Experiment results
show that for flow goodput, LiteFlow with these NNs can outperform userspace-deployed NNs
by up to 44.4% while suffering no more overhead than kernel-space CC algorithms such as
BBR and CUBIC.
Post a Comment