THESIS
2019
Abstract
Most recent advances in automated approaches towards WSI classification employ a two-stage
pipeline, i.e. patch-level classification followed by WSI-level aggregation. The training of patch
classifiers requires exact patch-level labels. This process either relies on exact labeled patches,
which are time-consuming to acquire, or is based on the assumption that most patches of a WSI
are discriminative for diagnosis. However, discriminative tumor regions often only occupy a small
portion of WSIs. One existing method used a pre-trained CNN to extract patch-level features. The
extracted features are further averaged to get a WSI representation. Directly pooling on all patches
in each WSI undermines the discriminative power of rare tumor patches in classification.
In this thesis we p...[
Read more ]
Most recent advances in automated approaches towards WSI classification employ a two-stage
pipeline, i.e. patch-level classification followed by WSI-level aggregation. The training of patch
classifiers requires exact patch-level labels. This process either relies on exact labeled patches,
which are time-consuming to acquire, or is based on the assumption that most patches of a WSI
are discriminative for diagnosis. However, discriminative tumor regions often only occupy a small
portion of WSIs. One existing method used a pre-trained CNN to extract patch-level features. The
extracted features are further averaged to get a WSI representation. Directly pooling on all patches
in each WSI undermines the discriminative power of rare tumor patches in classification.
In this thesis we propose a data-driven feature aggregation approach which could automatically
identify discriminative features. Specifically, we first cluster all patches of each WSI into
different clusters. Average pooling is applied among patches in each cluster. It helps remove redundant
features without losing discriminative information for diagnosis. The clustering centroids
are collected as instances in the WSI bag. Moreover, labels of patches cropped from normal WSIs
are also normal. We learn a GMM on the normal patches. Features with low GMM scores are
more likely to be tumor ones The learned GMM is used to select discriminative patches. Also the
clustering centroids in each WSI are sorted based on their GMM scores. We adopt two methods to
aggregate the cluster centroids: one is sequence-invariant MIL pooling methods, the other encodes
the sequence information by IndRNN.
To demonstrate the effectiveness of our method, we apply it to predicting breast cancer metastases
in lymph nodes, where only small regions in tumor WSIs have discriminative tumor patterns.
To the best of our knowledge we are the first to analyze the data without supervised patch-level
labels and achieved satisfactory results.
Post a Comment