THESIS
2020
xvi, 111 pages : illustrations ; 30 cm
Abstract
Pixel-wise recognition tasks, including semantic segmentation, salient object detection, and unsupervised video object segmentation in this thesis, aim to classify each pixel (point) of the input image (point cloud) into predefined categories. Traditional methods suffer from the poor discriminative ability of hand-crafted features. In this thesis, we delve into the CNN based methods to advance semantic segmentation, salient object detection and unsupervised video
object segmentation.
For semantic segmentation, we firstly introduce a fully dense neural network with an encoder-decoder structure that we abbreviate as FDNet, in which feature maps of all the previous blocks are adaptively aggregated to feedforward as input. On the one hand, it reconstructs the spatial boundaries accurately...[
Read more ]
Pixel-wise recognition tasks, including semantic segmentation, salient object detection, and unsupervised video object segmentation in this thesis, aim to classify each pixel (point) of the input image (point cloud) into predefined categories. Traditional methods suffer from the poor discriminative ability of hand-crafted features. In this thesis, we delve into the CNN based methods to advance semantic segmentation, salient object detection and unsupervised video
object segmentation.
For semantic segmentation, we firstly introduce a fully dense neural network with an encoder-decoder structure that we abbreviate as FDNet, in which feature maps of all the previous blocks are adaptively aggregated to feedforward as input. On the one hand, it reconstructs the spatial boundaries accurately. On the other hand, it learns more efficiently with the more efficient gradient backpropagation. We then present an joint multi-task learning framework for semantic
segmentation and semantic boundary detection. The critical component in the framework is the iterative pyramid context module (PCM), which couples two tasks and stores the shared latent semantics to interact between the two tasks. A novel loss function originated from the dual constraint is designed to improve further the performance for semantic segmentation, which ensures the consistency between semantic mask boundary and boundary groundtruth. The proposed
method is able to generate accurate mask and boundary estimation simultaneously.
For salient object detection, we propose a novel network by building a superpixel hierarchical graph, which is used to guide the context exchanging between superpixels. Instead of evenly and fixedly dividing an image to pixels or patches in existing methods, we take arbitrary-shaped superpixel as a node, and adaptively construct a hierarchical graph with three levels. Each superpixel first aggregates the context information from lower-level superpixels or pixels and the context message is flowing along the hierarchical graph. In addition, we illustrate an end-to-end differentiable morphological active contour model, which iteratively helps to improve the boundary accuracy of salient object. The proposed method achieves better performance compared with other methods and is validated through thorough experiments.
For unsupervised video object segmentation, we propose a discriminative feature network to capture the correlation of frames. To model the long-term dependency of video images, the learned discriminative features, which is extracted from all input images, are used to establish correspondence with all features of test image under conditional random field (CRF) formulation, which is leveraged to enforce consistency between pixels. The proposed method is able to capture and mine the underlying relations of images and discover the common foreground objects.
Post a Comment