THESIS
2020
xi, 92 pages : illustrations ; 30 cm
Abstract
Graph data are ubiquitous in practice, and subgraph matching is a basic operation in graph
analytics. As modern computers typically have sufficient processing power and memory
resources, we study how to accelerate in-memory subgraph matching on a single computer.
Our approach is through algorithmic optimization and parallelization.
First, we design PSM, a parallel algorithmic framework for backtracking-based subgraph
matching. Specifically, we abstract the matching process as a depth-first search
(DFS) tree, and parallelize the search by dividing up the tree into search regions and exploring
these regions in parallel. We design an effective dynamic load balance strategy for
the search. As a result, the PSM-style algorithms achieved a speedup of 15X-19X over
their sequential cou...[
Read more ]
Graph data are ubiquitous in practice, and subgraph matching is a basic operation in graph
analytics. As modern computers typically have sufficient processing power and memory
resources, we study how to accelerate in-memory subgraph matching on a single computer.
Our approach is through algorithmic optimization and parallelization.
First, we design PSM, a parallel algorithmic framework for backtracking-based subgraph
matching. Specifically, we abstract the matching process as a depth-first search
(DFS) tree, and parallelize the search by dividing up the tree into search regions and exploring
these regions in parallel. We design an effective dynamic load balance strategy for
the search. As a result, the PSM-style algorithms achieved a speedup of 15X-19X over
their sequential counterparts on a twenty-core machine.
Second, we develop LIGHT, a parallel subgraph matching algorithm for unlabeled
graphs. The search space in unlabeled subgraph matching, or called subgraph enumeration,
is big due to the absence of vertex labels. Our approach to reducing the search space
and thus redundant computation is to defer the materialization of intermediate results and
to increase their reuse. Furthermore, we parallelize our algorithm with vector and simultaneous
multithreading instructions in modern computers. Our results show that LIGHT
significantly outperforms the state of the art.
Finally, we design vcFV, an efficient framework for finding all data graphs in a graph
database each of which contains the given query graph. Existing work takes the indexing-filtering-verification approach to evaluate queries. However, the indexing phase suffers
serious scalability problems. Our framework utilizes the leading subgraph matching algorithms to answer queries, which removes the indexing phase. The experiment results
show that our framework is competitive in time performance and can scale to hundreds of
thousands of data graphs and graphs of thousands of vertices.
Post a Comment