THESIS
2020
xiv, 136 pages : illustrations ; 30 cm
Abstract
Similarity and causality mining on graphs are common and fundamental requirements
for graph-based applications. Specifically, the similarities on graphs can be separated into
two categories: the similarity between graphs (i.e., graph similarity), and the similarity
between nodes (i.e., node similarity) on a graph. A common graph similarity measure is
the Graph Edit Distance (GED). In this work, we propose a probabilistic approach, GBDA,
to approximate GED between two graphs according to the Graph Branch Distance (GBD)
between them. On the other hand, we propose an efficient index, G*-tree, to compute the
node similarity measured by the distance between two nodes, and we propose a shortcut
selection algorithm to optimize the efficiency of distance queries on G*-tree. Moreover, we...[
Read more ]
Similarity and causality mining on graphs are common and fundamental requirements
for graph-based applications. Specifically, the similarities on graphs can be separated into
two categories: the similarity between graphs (i.e., graph similarity), and the similarity
between nodes (i.e., node similarity) on a graph. A common graph similarity measure is
the Graph Edit Distance (GED). In this work, we propose a probabilistic approach, GBDA,
to approximate GED between two graphs according to the Graph Branch Distance (GBD)
between them. On the other hand, we propose an efficient index, G*-tree, to compute the
node similarity measured by the distance between two nodes, and we propose a shortcut
selection algorithm to optimize the efficiency of distance queries on G*-tree. Moreover, we
propose a multi-view learning method, TransN, to encode the similarity between nodes
into node embeddings. In TransN, we propose an algorithm to capture the proximity
information inside every single view. To transfer information across views, we propose
an algorithm to translate the node embeddings between different views based on the dual-learning
mechanism.
The causality on knowledge graphs is represented by the reasoning paths for each fact. The state-of-the-art solution of the reasoning path finding is to train a reinforcement
learning agent which selects some reasoning paths for a given fact. However, the existing
methods cannot distinguish the quality of different reasoning paths well. To address this
problem, in this paper, we model the problem of finding high-quality reasoning paths as
an optimization problem on the embedding space, which is NP-hard. Then, we propose a
novel approximation method, RETINA, to search the reasoning paths of high quality. To
enhance the efficiency of RETINA, we propose a set of lower bounds to prune reasoning
paths in the search process.
The experiments show that our proposed methods have better efficiency, effectiveness,
and scalability than the state-of-the-art methods on real and synthetic data sets.
Post a Comment