THESIS
2020
xii, 110 pages : illustrations ; 30 cm
Abstract
This thesis aims at speeding up graph structural clustering algorithms, including a pruning-based
structural clustering algorithm called pSCAN and a truss decomposition algorithm.
Such algorithms are often slow due to their intensive computation on structural similarity.
Therefore, we propose to parallelize these algorithms and optimize them on modern processors.
Specifically, we parallelize the pSCAN algorithm on multi-core CPUs and Intel Xeon Phi
Processors (KNL) with multiple threads and vectorized instructions. Our resulting ppSCAN
algorithm is scalable on both CPU and KNL with respect to the number of threads. We
further propose to accelerate the time-consuming common-neighbor counting operation in
ppSCAN on a multi-core CPU, a KNL, and an NVIDIA GPU. Our results show that...[
Read more ]
This thesis aims at speeding up graph structural clustering algorithms, including a pruning-based
structural clustering algorithm called pSCAN and a truss decomposition algorithm.
Such algorithms are often slow due to their intensive computation on structural similarity.
Therefore, we propose to parallelize these algorithms and optimize them on modern processors.
Specifically, we parallelize the pSCAN algorithm on multi-core CPUs and Intel Xeon Phi
Processors (KNL) with multiple threads and vectorized instructions. Our resulting ppSCAN
algorithm is scalable on both CPU and KNL with respect to the number of threads. We
further propose to accelerate the time-consuming common-neighbor counting operation in
ppSCAN on a multi-core CPU, a KNL, and an NVIDIA GPU. Our results show that a
bitmap-based algorithm works best on both the CPU and the GPU and that a merge-based
pivot-skip algorithm works best on the KNL for common neighbor counting. Finally, we
propose to accelerate truss decomposition, which divides a graph into a hierarchy of subgraphs,
or trusses. Our main idea is to compact intermediate results to optimize memory
access, dynamically adjust the computation based on data characteristics, and parallelize the
algorithm on both the multi-core CPU and the GPU. We evaluate the effects of individual
techniques, and our implementations on both platforms outperform the state of the art by
up to an order of magnitude.
Post a Comment