THESIS
2018
xvi, 99 pages : illustrations ; 30 cm
Abstract
In this thesis, I study two problems about network data. The first problem
is about network community detection and the second problem is about the
application of network data in recommender systems.
In the first part of this thesis, we propose a new method for network community
detection, with an emphasis on sparse networks. Previous methods focused
mainly on dense scenarios, where the average degree of nodes E(degree) = Ω(log n). Recently, sparse networks have received increasing attention amongst
researchers. The new proposed method uses the symmetrized Laplacian inverse
matrix (SLIM) to measure the closeness between nodes. The idea comes from the
first hitting time in random walk, and it also has a nice interpretation in diffusion
maps. Community membership is acquired by...[
Read more ]
In this thesis, I study two problems about network data. The first problem
is about network community detection and the second problem is about the
application of network data in recommender systems.
In the first part of this thesis, we propose a new method for network community
detection, with an emphasis on sparse networks. Previous methods focused
mainly on dense scenarios, where the average degree of nodes E(degree) = Ω(log n). Recently, sparse networks have received increasing attention amongst
researchers. The new proposed method uses the symmetrized Laplacian inverse
matrix (SLIM) to measure the closeness between nodes. The idea comes from the
first hitting time in random walk, and it also has a nice interpretation in diffusion
maps. Community membership is acquired by applying the spectral method to
the SLIM. We show that the SLIM ensures the misclassification rate of order
O(log(max nP
i,j)/max nP
i,j), for the stochastic block model (SBM) as sparse as
E(degree) → ∞, with edge probability matrix P. The SLIM outperforms other
methods in many simulations and real datasets. It's robust with respect to the
choice of tuning parameter, in contrast to normalized spectral clustering with
regularization.
In the second part of this thesis, we present the NetRec method to leverage the
network data in collaborative filtering (CF). The recent prevalence of online social
networks provides valuable side information to the technique of CF. Previous
efforts to exploit network data lack theoretical justifications that measure the
improvement achieved by adding the relational data. Moreover, all of them
find a local optimum of the proposed objective function. In this thesis, we
formulate a convex optimization problem by including a new network related
term, which is particularly designed to confine the introduced bias. Our main
results demonstrate that the inclusion of the designed term leads to a sharper
error bound than before, as long as the observed users social network is consistent
with their preferences and is well structured. We point out that the larger the
noise in the observed user preferences, the larger the reduction in the level of error
bound. Moreover, our theory shows that the combination of the network related
term and the nuclear norm term obtains estimates better than those achieved by
any of them alone. We provide an algorithm to solve our new objective function
and prove that it is guaranteed to find the global optimum. Both simulations
and real data experiments are carried out to validate our theoretical findings.
Data on Yelp.com is used to test the new method and substantial improvements
achieved by using the social network data are observed.
Post a Comment