THESIS
2020
1 online resource (ix, 45 pages) : color illustrations
Abstract
Discriminative clustering approaches assign data points to different groups by identifying
sparse regions, without explicitly modeling the dataset and categories. Such methods
are flexible and powerful in practice since they make few assumptions. In particular, the
probabilistic-based Softmax model makes only one assumption, which is that data points
are linearly separable. Therefore, it is potentially suitable in clustering data processed by
feature transformation techniques. The principle of cluster assumption states that decision
boundaries of clusters should lie in low-density regions. In previous works on discriminative
clustering, this principle has been compromised by the cluster balance consideration,
which is incorporated to avoid degenerate clustering solutions. However, datas...[
Read more ]
Discriminative clustering approaches assign data points to different groups by identifying
sparse regions, without explicitly modeling the dataset and categories. Such methods
are flexible and powerful in practice since they make few assumptions. In particular, the
probabilistic-based Softmax model makes only one assumption, which is that data points
are linearly separable. Therefore, it is potentially suitable in clustering data processed by
feature transformation techniques. The principle of cluster assumption states that decision
boundaries of clusters should lie in low-density regions. In previous works on discriminative
clustering, this principle has been compromised by the cluster balance consideration,
which is incorporated to avoid degenerate clustering solutions. However, datasets are
rarely balanced with respect to attributes of interest. Furthermore, large clusters from
imbalanced datasets might also contain sparse regions, where decision boundaries should
not be positioned. In this thesis, we present self-optimality, a novel criterion for Softmax
discriminative clustering, which is faithful to the cluster assumption principle and is free
of cluster balance considerations. We also propose an adaptive algorithm aimed at finding
self-optimal solutions, which can accurately recognize clusters from linearly separable
imbalanced datasets with multiple degrees of sparseness.
Post a Comment