Mining similarity and causality on graphs

HKUST Electronic Theses

Mining similarity and causality on graphs

by Zijian Li

THESIS 2020

Ph.D. Computer Science and Engineering

xiv, 136 pages : illustrations ; 30 cm

Abstract

Similarity and causality mining on graphs are common and fundamental requirements for graph-based applications. Specifically, the similarities on graphs can be separated into two categories: the similarity between graphs (i.e., graph similarity), and the similarity between nodes (i.e., node similarity) on a graph. A common graph similarity measure is the Graph Edit Distance (GED). In this work, we propose a probabilistic approach, GBDA, to approximate GED between two graphs according to the Graph Branch Distance (GBD) between them. On the other hand, we propose an efficient index, G*-tree, to compute the node similarity measured by the distance between two nodes, and we propose a shortcut selection algorithm to optimize the efficiency of distance queries on G*-tree. Moreover, we propose a multi-view learning method, TransN, to encode the similarity between nodes into node embeddings. In TransN, we propose an algorithm to capture the proximity information inside every single view. To transfer information across views, we propose an algorithm to translate the node embeddings between different views based on the dual-learning mechanism.

The causality on knowledge graphs is represented by the reasoning paths for each fact. The state-of-the-art solution of the reasoning path finding is to train a reinforcement learning agent which selects some reasoning paths for a given fact. However, the existing methods cannot distinguish the quality of different reasoning paths well. To address this problem, in this paper, we model the problem of finding high-quality reasoning paths as an optimization problem on the embedding space, which is NP-hard. Then, we propose a novel approximation method, RETINA, to search the reasoning paths of high quality. To enhance the efficiency of RETINA, we propose a set of lower bounds to prune reasoning paths in the search process.

The experiments show that our proposed methods have better efficiency, effectiveness, and scalability than the state-of-the-art methods on real and synthetic data sets.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Chen, Lei Authors Li, Zijian Subjects Computer graphics Data processing Graph theory Graph algorithms Language English Call number Thesis CSED 2020 Li DOI 10.14711/thesis-991012818569203412

Full record

Mining similarity and causality on graphs

by Zijian Li

Post a Comment Cancel reply