THESIS
2019
ix, 40 pages : illustrations ; 30 cm
Abstract
Dataflow is a prevailing programming paradigm for processing data in a distributed fashion. When programming dataflow applications, programmers express a data processing task as a dataflow graph, with its vertices as specific operations and its edges as
input/output relations or dataflow dependencies between operations. When deployed inside a data center in which a large number of processors are available, the dataflow graph is partitioned and placed onto different processors for improved processing throughput.
For graph edges that cross the partition boundaries, the distributed execution engine
needs to transferred data chunks between different processors. Increasingly higher data
volumes and more substantial processing power of individual processors often bringing in
communicatio...[
Read more ]
Dataflow is a prevailing programming paradigm for processing data in a distributed fashion. When programming dataflow applications, programmers express a data processing task as a dataflow graph, with its vertices as specific operations and its edges as
input/output relations or dataflow dependencies between operations. When deployed inside a data center in which a large number of processors are available, the dataflow graph is partitioned and placed onto different processors for improved processing throughput.
For graph edges that cross the partition boundaries, the distributed execution engine
needs to transferred data chunks between different processors. Increasingly higher data
volumes and more substantial processing power of individual processors often bringing in
communication bottleneck onto the inter-processor links, resulting in severe performance
degradation to the distributed dataflow applications.
In recent years, Remote Direct Memory Access (RDMA) becomes widely deployed in the data center as an alternative to the Transport Control Protocol (TCP). RDMA offers ultra-low latency and CPU bypass networking to application programmers. Programmers
often designed existing applications around socket-based software stack that manages application buffers separately from networking buffers and does memory copies between them when sending and receiving data. With large-sized (up to hundreds MB) application buffers, the cost of such copies adds non-trivial overhead to the end-to-end communication pipeline. In this work, we attempted to design a zero-copy transport for distributing dataflow applications that unifies application and networking buffer management
and eliminates unnecessary memory copies. Our prototype on top of TensorFlow shows 2.43x performance improvement over gRPC based transport and 1.21x performance improvement over an alternative RDMA transport with separate buffers and memory
copies.
Post a Comment