THESIS
2019
xii, 112 pages : illustrations ; 30 cm
Abstract
Measuring similarities among different nodes is a fundamental problem in graph analysis.
Among different similarity measurements, SimRank is one of the most promising
and popular. Due to the huge amount of data generated by the activities of people every
data, today’s graphs are often big and time evolving, examples include on-line social networks
and on-line shopping platforms. This challenges current algorithms for SimRank
computation w.r.t. efficiency, scalability and quality of results. In the thesis, we first study
the problem of all-pairs SimRank computation. Observing current iterative methods for
all-pairs SimRank are not efficient in time and space, due to unnecessary cost and storage
by the nature of iterative updating, we propose a local push based algorithm, which
h...[
Read more ]
Measuring similarities among different nodes is a fundamental problem in graph analysis.
Among different similarity measurements, SimRank is one of the most promising
and popular. Due to the huge amount of data generated by the activities of people every
data, today’s graphs are often big and time evolving, examples include on-line social networks
and on-line shopping platforms. This challenges current algorithms for SimRank
computation w.r.t. efficiency, scalability and quality of results. In the thesis, we first study
the problem of all-pairs SimRank computation. Observing current iterative methods for
all-pairs SimRank are not efficient in time and space, due to unnecessary cost and storage
by the nature of iterative updating, we propose a local push based algorithm, which
has the property that not all SimRank scores are involved in the computation. The push-on-demand schema can reduce a lot of unnecessary cost, and has an accuracy guarantee.
We further extend our algorithm to track accurate SimRank scores in dynamic graphs,
which can address the accuracy issue of current incremental solutions. We then study the
pairwise SimRank estimation problem, observing that current single-pair SimRank solutions
are either static or inefficient in handling dynamic cases with good-quality results, we propose three algorithms to query pairwise SimRank over static and dynamic graphs
efficiently, by using different sample reduction strategies. The accuracy of our algorithms
is guaranteed by the different invariants we propose for pairwise SimRank. Finally, we
study the problem finding similar pairs given a set of node pairs with SimRank, which
has attractive applications in personalized search and recommendation tasks. We present
an efficient framework for retrieving the top-k similarities from an arbitrary set of pairs.
In addition, we introduce two types of indexes to boost the efficiency, one is hub-based,
the other is tree-based.
Post a Comment