THESIS
2020
xii, 106 pages : illustrations ; 30 cm
Abstract
Data-parallel primitives, such as gather, scatter, scan (prefix sum), and split, are widely
used in parallel programs. Multi-way joins are a common operator in data analytics applications.
In this thesis, we design and implement efficient algorithms for these primitives
and join operators on heterogeneous processors, including multi-core CPUs, Intel Xeon
Phi (KNC) processors, and Graphics Processing Units (GPUs).
Specifically, we first revisit the performance of scatter and gather on new-generation
GPUs, and propose a new model for their optimization. We then propose optimization
strategies for these two primitives as well as scan and split that work well for an Intel
multi-core CPU, an NVIDIA GPU, and a KNC. Finally, we propose a GPU-based multi-way
hash join solution that eff...[
Read more ]
Data-parallel primitives, such as gather, scatter, scan (prefix sum), and split, are widely
used in parallel programs. Multi-way joins are a common operator in data analytics applications.
In this thesis, we design and implement efficient algorithms for these primitives
and join operators on heterogeneous processors, including multi-core CPUs, Intel Xeon
Phi (KNC) processors, and Graphics Processing Units (GPUs).
Specifically, we first revisit the performance of scatter and gather on new-generation
GPUs, and propose a new model for their optimization. We then propose optimization
strategies for these two primitives as well as scan and split that work well for an Intel
multi-core CPU, an NVIDIA GPU, and a KNC. Finally, we propose a GPU-based multi-way
hash join solution that effectively utilizes the primitives to achieve high bandwidth
utilization on GPUs. Our core idea is to design a warp-based parallelization strategy for
reducing thread divergence and for coalescing memory accesses. We further enhance
our implementation with a set of GPU-friendly optimizations, including dynamic skew
handling for load balance and a sampling-based probing strategy for result counting. We
experimentally compare our method with cascaded pairwise hash joins as well as merge-based
multi-way joins on GPUs and CPUs. The results show that our method outperforms
prior work.
Post a Comment