Learning discriminative representation for pixel-wise recognition

HKUST Electronic Theses

Learning discriminative representation for pixel-wise recognition

by Mingmin Zhen

THESIS 2020

Ph.D. Computer Science and Engineering

xvi, 111 pages : illustrations ; 30 cm

Abstract

Pixel-wise recognition tasks, including semantic segmentation, salient object detection, and unsupervised video object segmentation in this thesis, aim to classify each pixel (point) of the input image (point cloud) into predefined categories. Traditional methods suffer from the poor discriminative ability of hand-crafted features. In this thesis, we delve into the CNN based methods to advance semantic segmentation, salient object detection and unsupervised video object segmentation.

For semantic segmentation, we firstly introduce a fully dense neural network with an encoder-decoder structure that we abbreviate as FDNet, in which feature maps of all the previous blocks are adaptively aggregated to feedforward as input. On the one hand, it reconstructs the spatial boundaries accurately. On the other hand, it learns more efficiently with the more efficient gradient backpropagation. We then present an joint multi-task learning framework for semantic segmentation and semantic boundary detection. The critical component in the framework is the iterative pyramid context module (PCM), which couples two tasks and stores the shared latent semantics to interact between the two tasks. A novel loss function originated from the dual constraint is designed to improve further the performance for semantic segmentation, which ensures the consistency between semantic mask boundary and boundary groundtruth. The proposed method is able to generate accurate mask and boundary estimation simultaneously.

For salient object detection, we propose a novel network by building a superpixel hierarchical graph, which is used to guide the context exchanging between superpixels. Instead of evenly and fixedly dividing an image to pixels or patches in existing methods, we take arbitrary-shaped superpixel as a node, and adaptively construct a hierarchical graph with three levels. Each superpixel first aggregates the context information from lower-level superpixels or pixels and the context message is flowing along the hierarchical graph. In addition, we illustrate an end-to-end differentiable morphological active contour model, which iteratively helps to improve the boundary accuracy of salient object. The proposed method achieves better performance compared with other methods and is validated through thorough experiments.

For unsupervised video object segmentation, we propose a discriminative feature network to capture the correlation of frames. To model the long-term dependency of video images, the learned discriminative features, which is extracted from all input images, are used to establish correspondence with all features of test image under conditional random field (CRF) formulation, which is leveraged to enforce consistency between pixels. The proposed method is able to capture and mine the underlying relations of images and discover the common foreground objects.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Quan, Long Authors Zhen, Mingmin Subjects Image processing Mathematical models Image segmentation Language English Call number Thesis CSED 2020 Zhen DOI 10.14711/thesis-991012862969803412

Full record

Learning discriminative representation for pixel-wise recognition

by Mingmin Zhen

Post a Comment Cancel reply