THESIS
2020
xviii, 129 pages : illustrations ; 30 cm
Abstract
Cameras are of vital importance for autonomous robots thanks to their lightweight,
rich information, and low power consumption. During the past decades, researchers focus
on multi-camera systems, e.g. stereo cameras, to generate dense depth maps and further
build 3D maps. Although widely used in many robotic systems, multi-camera systems
require careful calibration and maintenance. The baseline requirement also makes them
unsuitable for micro-robots. In this thesis, we present a family of methods that can densely
estimate depth images and build a globally consistent 3D map using only one moving
localized monocular camera. To estimate dense depth maps, our method first exploits the
multi-baseline observation of pixels and then fuse the sequential estimated depth maps in
a proba...[
Read more ]
Cameras are of vital importance for autonomous robots thanks to their lightweight,
rich information, and low power consumption. During the past decades, researchers focus
on multi-camera systems, e.g. stereo cameras, to generate dense depth maps and further
build 3D maps. Although widely used in many robotic systems, multi-camera systems
require careful calibration and maintenance. The baseline requirement also makes them
unsuitable for micro-robots. In this thesis, we present a family of methods that can densely
estimate depth images and build a globally consistent 3D map using only one moving
localized monocular camera. To estimate dense depth maps, our method first exploits the
multi-baseline observation of pixels and then fuse the sequential estimated depth maps in
a probabilistic way. The quadtree structure similarity between RGB and depth images is
also exploited to accelerate the belief propagation. With the development of deep learning,
we propose a convolutional neural network to directly solve the depth map from multiview
observation. Trained with RGB-D datasets, the learned network produces smoother and
more accurate depth estimation compared with traditional methods. We also show that
the geometry information can be learned in a self-supervised way using almost unlimited
monocular videos from the Internet. Networks pretrained with the proposed method
outperforms ImageNet pretrained networks in both accuracy and generalization ability.
Observing the relationship between camera poses and pixel correspondences, we propose a
network that solves both the dense pixel correspondence and camera pose in an alternative
optimization way such that both components can benefit from each other. With solved pixel correspondences and camera poses, depth maps can be triangulated easily in a
network. Lastly, we propose a depth map fusion method that uses surfel representation.
With surfel-based fusion, a globally consistent map can be built with online deformation
enabling robots to navigate in large-scale complex environments.
Post a Comment