THESIS
2020
xiii, 100 pages : illustrations ; 30 cm
Abstract
Given a data matrix D, a submatrix S of D is an order-preserving submatrix (OPSM) if there is a
permutation of the columns of S, under which the entry values of each row in S are strictly increasing.
OPSM mining is widely used in real-life applications such as identifying coexpressed genes,
and finding customers with similar preference. However, noise is ubiquitous in real data matrices
due to variable experimental conditions and measurement errors, which makes conventional OPSM
mining algorithms inapplicable. No previous work has ever combated uncertain value intervals using
the possible world semantics.
We establish two different definitions of significant OPSMs based on the possible world semantics:
(1) expected support based and (2) probabilistic frequentness based. An optim...[
Read more ]
Given a data matrix D, a submatrix S of D is an order-preserving submatrix (OPSM) if there is a
permutation of the columns of S, under which the entry values of each row in S are strictly increasing.
OPSM mining is widely used in real-life applications such as identifying coexpressed genes,
and finding customers with similar preference. However, noise is ubiquitous in real data matrices
due to variable experimental conditions and measurement errors, which makes conventional OPSM
mining algorithms inapplicable. No previous work has ever combated uncertain value intervals using
the possible world semantics.
We establish two different definitions of significant OPSMs based on the possible world semantics:
(1) expected support based and (2) probabilistic frequentness based. An optimized dynamic
programming approach is proposed to compute the probability that a row supports a particular
column permutation, and several effective pruning rules are introduced to efficiently prune insignificant
OPSMs. To further improve the efficiency of the probabilistic frequentness checking
we proposed one Poisson distribution based and one Gaussian distribution approximation algorithms. These techniques are integrated into our two OPSM mining algorithms, based on prefix-projection
and Apriori respectively. Extensive experiments on real microarray data demonstrate
that the OPSMs found by our algorithms have a much higher quality than those found by existing
approaches.
Post a Comment