THESIS
2019
xiv, 86 pages : illustrations ; 30 cm
Abstract
Convolutional neural networks (CNNs or ConvNets) provide a powerful end-to-end framework
to learn features or representations directly from raw data in a hierarchical manner. Mining
the features of a pre-trained CNN helps us understand the workings of CNNs and eases
the burden of data acquisition and annotation in deep learning. This thesis focuses on mining
the hidden features of a deep CNN trained as a classifier to obtain finer levels of perception
results, and applying them in autonomous driving.
We first develop a CNN-based method for fine-grained categorization where the training
data are limited and the differences between the classes are very subtle. By extracting and
interpreting the hierarchical hidden layer features learned by a CNN, our method obtains
robust object...[
Read more ]
Convolutional neural networks (CNNs or ConvNets) provide a powerful end-to-end framework
to learn features or representations directly from raw data in a hierarchical manner. Mining
the features of a pre-trained CNN helps us understand the workings of CNNs and eases
the burden of data acquisition and annotation in deep learning. This thesis focuses on mining
the hidden features of a deep CNN trained as a classifier to obtain finer levels of perception
results, and applying them in autonomous driving.
We first develop a CNN-based method for fine-grained categorization where the training
data are limited and the differences between the classes are very subtle. By extracting and
interpreting the hierarchical hidden layer features learned by a CNN, our method obtains
robust object and part detection when only the class labels are provided in training, then
leverages these detection results to boost the classification results.
Recently, weakly supervised semantic segmentation has aroused interest, since pixel-wise
labels are expensive to obtain. This thesis secondly proposes a novel weakly-supervised semantic
segmentation method using image-level labels only. The class-specific activation maps
from the well-trained classifiers are used as cues to train a segmentation network. We use
super-pixels to refine the cues, and fuse the cues extracted from both a color-image-trained
classifier and a gray-image-trained classifier to compensate for their incompleteness.
Lastly, we use our weakly-supervised semantic segmentation method to modify a visual
simultaneous localization and mapping (vSLAM) system so that the movable objects in the
environment are not used as landmarks for camera pose estimation. Experimental results
demonstrate the superiority of our proposed method over the state-of-the-art.
Post a Comment