THESIS
2022
1 online resource (xiv, 104 pages) : illustrations (chiefly color)
Abstract
In recent decades, with the prevalence of affordable RGB-D cameras and LiDAR (Light Detection
and Ranging) scanners, point cloud representation has become increasingly practical and
popular in many computer vision applications, such as structure from motion (SfM), simultaneous
localization and mapping (SLAM). Since an acquisition of point clouds is usually limited to only
one perspective, several acquisitions are required to cover the whole area of interest. Point cloud
registration is then applied as a fundamental step for finding an optimal transformation between the
partially overlapped point cloud fragments and recovering a complete underlying geometry. Subsequently,
with the reconstructed 3D model represented as point clouds, performing semantic scene
understanding is necessary fo...[
Read more ]
In recent decades, with the prevalence of affordable RGB-D cameras and LiDAR (Light Detection
and Ranging) scanners, point cloud representation has become increasingly practical and
popular in many computer vision applications, such as structure from motion (SfM), simultaneous
localization and mapping (SLAM). Since an acquisition of point clouds is usually limited to only
one perspective, several acquisitions are required to cover the whole area of interest. Point cloud
registration is then applied as a fundamental step for finding an optimal transformation between the
partially overlapped point cloud fragments and recovering a complete underlying geometry. Subsequently,
with the reconstructed 3D model represented as point clouds, performing semantic scene
understanding is necessary for many applications, such as autonomous driving and augmented reality.
In this thesis, we present our contributions in solving these two problems, namely, point cloud
registration and point cloud semantic understanding.
First, we present two methods for efficient and robust point cloud registration based on the
key idea of decomposing the point cloud registration pipeline into three learnable sub-modules. In
the first method, we design a keypoint detector sub-module and a keypoint descriptor sub-module
for efficient local feature extraction, emphasizing the importance of a reliable keypoint detector and
demonstrating the superiority of joint learning of both detection and description tasks. In the second method, we propose a correspondence filtering sub-module to improve the robustness towards large
outlier ratio. We explicitly incorporate the spatial consistency constrained by rigid transformations
for pruning outlier correspondences.
Second, we study the problem of semantic understanding on registered point clouds. Specifically,
we propose a LiDAR-camera fusion solution for 3D perception. Our studies investigate the
inherent difficulties of LiDAR-camera fusion and reveal a crucial aspect to robust fusion, namely,
the soft-association mechanism. The proposed module is integrated into object detection and multiple
object tracking frameworks and can be easily extended to other tasks such as LiDAR semantic
segmentation.
In summary, we developed three learning-based methods for robust point cloud registration
and semantic parsing on registered point clouds. The proposed methods have been extensively
evaluated on standardized benchmarks, where their superior performance and strong generalization
ability have been demonstrated.
Post a Comment