THESIS
2023
1 online resource (xvii, 261 pages) : illustrations (some color)
Abstract
In the past decade, there has been a surge of interest in processing and analyzing mixture
matrix-valued models arising in many scientific fields such as genomics, neuroimaging,
and social network analysis. Owing to the ultra-large dimensions of the observed matrices,
recent works have assumed certain types of low-rankness for the underlying parameters
of interest. Such low-rankness assumption is also well-motivated in practice, as many
datasets exhibit low-rank structure due to shared latent factors or sparse interactions
among variables. Unlike the promising empirical performance of various mixture matrix-valued
models, their theoretical properties were less justified. This motivates our first part
of the thesis, focusing on fundamental limits of estimation in low-rank Gaussian mixtu...[
Read more ]
In the past decade, there has been a surge of interest in processing and analyzing mixture
matrix-valued models arising in many scientific fields such as genomics, neuroimaging,
and social network analysis. Owing to the ultra-large dimensions of the observed matrices,
recent works have assumed certain types of low-rankness for the underlying parameters
of interest. Such low-rankness assumption is also well-motivated in practice, as many
datasets exhibit low-rank structure due to shared latent factors or sparse interactions
among variables. Unlike the promising empirical performance of various mixture matrix-valued
models, their theoretical properties were less justified. This motivates our first part
of the thesis, focusing on fundamental limits of estimation in low-rank Gaussian mixture
model (LrMM) and revealing a statistical-to-computational gap. Another related yet
more practical task in LrMM is clustering. In the second part of the thesis, we develop
an modified Lloyd’s algorithm that utilizes the planted low-rankness with optimal error
rate. The algorithm has been extensively evaluated on both synthetic and real-world
datasets with appealing performance. The third part of the thesis shifts focus to network
data. We propose a flexible mixture model concerning multi-layer networks, together with
a tensor-based algorithm to consistently recovering local/global communities of nodes as
well as labels of networks, and its application on real datasets yields new and interesting
findings.
Post a Comment