THESIS
2020
xvii, 117 pages : illustrations ; 30 cm
Abstract
Geometric image matching requires to establish sparse correspondences on 2D points, upon
which the camera geometry is recovered and the static scene structure is reconstructed. In
essence, the performance of a broad range of computer vision applications is highly dependent
on searching strong correspondences, including panorama stitching, visual localization,
Structure-from-Motion (SfM), Simultaneous Localization and Mapping (SLAM), Augmented
Reality (AR) and 3D reconstruction. During the past decade, hand-crafted keypoint features
and engineered feature matchers have been primarily used in general-purpose image matching
pipelines as the de-facto standard. Despite their apparent success, the traditional methods are
still known to have difficulty in identifying reliable correspon...[
Read more ]
Geometric image matching requires to establish sparse correspondences on 2D points, upon
which the camera geometry is recovered and the static scene structure is reconstructed. In
essence, the performance of a broad range of computer vision applications is highly dependent
on searching strong correspondences, including panorama stitching, visual localization,
Structure-from-Motion (SfM), Simultaneous Localization and Mapping (SLAM), Augmented
Reality (AR) and 3D reconstruction. During the past decade, hand-crafted keypoint features
and engineered feature matchers have been primarily used in general-purpose image matching
pipelines as the de-facto standard. Despite their apparent success, the traditional methods are
still known to have difficulty in identifying reliable correspondences with large illumination or
perspective changes, which as a result, has become the bottleneck for acquiring better spatial
understanding in 3D.
With the emerging of deep learning, a great amount of effort has been spent on reformulating
each component of image matching through modern neural network architectures, which can
be efficiently optimized in a data-driven and differentiable manner. In this thesis, we will first
review the recent achievements on learning-based image matching techniques, then reveal the
substantial challenges arisen from practical use, and finally elaborate the methods we have
proposed that give rise to state-of-the-art results on several important benchmarking datasets.
More specifically, we decompose the learning-based image matching pipeline into four
learnable sub-modules. First, a local feature extractor 1) with a keypoint detector and 2) a
keypoint descriptor, where we address the accuracy of keypoint localization, the efficiency of
training data sampling, the aggregation of contextual information, and the advantage of a joint learning of both detection and description tasks. Next, 3) a specialized image retrieval system
for SfM tasks, which shortlists the matching candidates from a large image collection and
identifies geometric image overlaps even without clear-defined semantics. Lastly, 4) a feature
matcher that rejects outlier correspondences in solving two-view geometric models, and leverages
spatial context such as motion consistency from correspondence input.
To facilitate above research, we also present a large-scale dataset that employs an automatic
pipeline to generate rich and accurate geometric training labels from well-reconstructed 3D
models. The proposed methods have been integrated into several important applications, and
in particular evaluated in the context of visual 3D modelling, where drastic improvements and
strong generalization ability have been demonstrated.
Post a Comment