THESIS
2017
xii, 65 pages : illustrations ; 30 cm
Abstract
In recent years, the emergence of reliable and low-cost RGB-D sensors (e.g., Microsoft
Kinect) has expended the dimension of a single image from 2D to 3D. With the aiding of
additional depth channel the 3D spatial information is discovered to compensate 2D image
plane, in which way many computer vision tasks are boosted. In this thesis, we present the
research of extending two vision tasks, shadow removal and object proposal generation, from
RGB to a single RGB-D image.
Shadow removal is a classical and challenging computer vision problem. First we propose
an automatic method to remove shadows from single RGB-D images. Using normal cues directly
derived from depth, we can remove both hard and soft shadows while preserving surface
texture and shading. Our key assumption is: pixe...[
Read more ]
In recent years, the emergence of reliable and low-cost RGB-D sensors (e.g., Microsoft
Kinect) has expended the dimension of a single image from 2D to 3D. With the aiding of
additional depth channel the 3D spatial information is discovered to compensate 2D image
plane, in which way many computer vision tasks are boosted. In this thesis, we present the
research of extending two vision tasks, shadow removal and object proposal generation, from
RGB to a single RGB-D image.
Shadow removal is a classical and challenging computer vision problem. First we propose
an automatic method to remove shadows from single RGB-D images. Using normal cues directly
derived from depth, we can remove both hard and soft shadows while preserving surface
texture and shading. Our key assumption is: pixels with similar normals, spatial locations and
chromaticity should have similar colors. A modified nonlocal matching is used to compute
a shadow confidence map that localizes well hard shadow boundary, thus handling hard and
soft shadows within the same framework. Then the detected shadows will be removed by a
constrained linear optimization to reconstruct a shadow-less image. We compare our results
produced using state-of-the-art shadow removal on single RGB images, and intrinsic image
decomposition on standard RGB-D datasets.
Our second task is to generate object proposals from RGB-D images. But before that we
present a novel method to produce proposals for 2D images. Object proposals are the potential
object candidates in the detection pipeline. Besides, distance metric plays a key role in
grouping superpixels to produce the proposals for object detection. We observe that existing
distance metrics work primarily for low complexity cases. In this thesis, we develop a novel distance metric for grouping two superpixels in high-complexity scenarios. Combining them, a
complexity-adaptive distance measure is produced that achieves improved grouping in different
levels of complexity. Our extensive experimentation shows that our method can achieve good
results in the PASCAL VOC 2012 dataset surpassing the latest state-of-the-art methods.
Next we focus on the task of extracting 3D region proposals from indoor RGB-D images,
which aims to produce bounding boxes of candidate objects. 3D voxel grid contains large
amount of redundant space. To rule out less-informative voxels and to simplify the problem we
introduce a space compression procedure to squash 3D space to 2D ”tile” grid. After each tile
is layered in vertical direction individually, we propose Structural Constrained Parametric Min-Cuts (S-CPMC) to group the tilted space. The extracted tiles are further processed to reconstruct
3D bounding boxes through geodesic distance transformation (GDT) from the generated tile
hypotheses. Finally the object hypotheses are ranked by a trained ranker. Experiments show
that our algorithm achieves comparing result to stat-of-the-art on SUN-RGBD dataset.
Post a Comment