THESIS
2019
xvi, 69 pages : illustrations ; 30 cm
Abstract
In this thesis, we focus on two problems, i.e. graph learning and clustering. On the problem
of learning the graph from the observation of the signals over the nodes, we propose a
new framework to effectively and efficiently solve it. Specifically, we focus on estimating
the direct relationships between different nodes in an undirected graph, given the observable
indirect relationships implied by a diffusion process over the graph. Based on the fact that
the spectral template of the covariance matrix, i.e., the eigenvectors, coincides with that of
the graph shift operator, we propose multiple formulations and the corresponding algorithms
to estimate the operator: i) we introduce a Frobenius norm penalty element to learn a sparse
graph structure and solve it with the majorization...[
Read more ]
In this thesis, we focus on two problems, i.e. graph learning and clustering. On the problem
of learning the graph from the observation of the signals over the nodes, we propose a
new framework to effectively and efficiently solve it. Specifically, we focus on estimating
the direct relationships between different nodes in an undirected graph, given the observable
indirect relationships implied by a diffusion process over the graph. Based on the fact that
the spectral template of the covariance matrix, i.e., the eigenvectors, coincides with that of
the graph shift operator, we propose multiple formulations and the corresponding algorithms
to estimate the operator: i) we introduce a Frobenius norm penalty element to learn a sparse
graph structure and solve it with the majorization-minimization and block coordinate descent
frameworks, achieving a significantly smaller recovery error and faster convergence speed
than the benchmark methods; ii) not limited by some approximate functions for the l
0-‘norm’,
we derive a new method to minimize l
0-'norm’ directly, achieving even smaller error; and
iii) considering prior knowledge of miscellaneous graph topologies in practice, we derive a
new family of algorithms to learn the graph shift operator following specific topologies, such
as k-sparse graphs, trees, forests, and bipartite graphs, given spectral templates. Our methods
can be used to recover the graph structure as long as the spectral template is available.
Extensive numerical experiments demonstrate that our methods can be used to estimate the
weighted adjacency matrix and inferring meaningful connections efficiently and effectively.
The results show that our proposed methods can achieve superior or equivalent performance
to the benchmark methods on all the considered tasks. When the weighted adjacency matrix,
i.e. the graph structure, is known, some further works can be done to retrieve more valuable
information rooted in the graphs. Graph clustering is one of them, which is to allocate
those connected closely vertices into the same clusters while dividing those connected weakly
into different ones. In the field of graph clustering, directed graph (digraph) clustering is
more challenging than the undirected one, because directed edges, unlike undirected ones, do
not indicate an essential feature for finding plausible clusters, the connectivity, in a simple
and obvious way. Naturally, the interactions between vertices and the clusters are based on
the bi-directional paths between them. Thus, strong connectivity built by interactions can
be implied by the massive bi-directional connections. Observed this point, we propose the
normalized mutual cut (NMcut) for evaluating the goodness of clustering in digraphs. It is justified that the clusters of the balanced volumes with inner vertices connected strongly
in double directions can be found with low NMcut. Meanwhile, because the numbers and
strengths of the paths between a pair of vertices or subgraphs may not be equal reciprocally,
the flows along these paths may also follow specific orientations. We name these orientations
“tendencies” and try to find a powerful method that can retrieve both sensible clusters but
also their pair-wise tendencies. Although satisfactory clusters can be obtained with a low
NMcut, we prove that minimizing it is NP-hard. Then, to solve the problem efficiently, for its
particular case on two clusters, we embed it into continuous variables and propose an algorithm
with closed-form solutions. Based on this, a “top-down” hierarchical, a.k.a. “divisive”,
algorithm is proposed to retrieve any number of clusters with low pair-wise connectivity. Furthermore,
we derive a simultaneous algorithm to obtain clusters of low global connectivity
by embedding vertices. The tendencies between clusters are shown to be indicated by the
algorithms without extra costs. We prove the convergence of the algorithms and verify their
effectiveness and efficiency by extensive synthetic and empirical experiments. The experimental
results show that our methods can achieve superior or equivalent performance to the
benchmark approaches, with applications to the fields such as the community detection.
Post a Comment