THESIS
2019
xiii, 103 pages : illustrations ; 30 cm
Abstract
Data association, in the context of Structure-from-Motion (SfM) and Simultaneous Localization
and Mapping (SLAM), is the process of associating uncertain measurements (e.g., image
pixels, local descriptors, and 3D tracks) to the same object or identity. It forms the foundation
of many 3D computer vision problems, starting from finding local feature correspondences,
identifying similar images with overlaps, up to bundle adjustment and related graph-based optimization
problems that seek to achieve a harmonious status in terms of geometric and photometric
quantities. Unlike deterministic pose estimation algorithms that typically have closed-form
solutions, data association usually works in a noisy setting and does not possess an analytical
form. It dramatically affects the efficien...[
Read more ]
Data association, in the context of Structure-from-Motion (SfM) and Simultaneous Localization
and Mapping (SLAM), is the process of associating uncertain measurements (e.g., image
pixels, local descriptors, and 3D tracks) to the same object or identity. It forms the foundation
of many 3D computer vision problems, starting from finding local feature correspondences,
identifying similar images with overlaps, up to bundle adjustment and related graph-based optimization
problems that seek to achieve a harmonious status in terms of geometric and photometric
quantities. Unlike deterministic pose estimation algorithms that typically have closed-form
solutions, data association usually works in a noisy setting and does not possess an analytical
form. It dramatically affects the efficiency and accuracy of the reconstruction. This thesis
explores the elements of the data association problem in the context of 3D reconstruction and
related issues. More specifically, we first give a thorough overview of the state-of-the-art SfM
pipeline, with a focus on the functionality of data association in each of its sub-steps. Then we
describe three novel methods to solve the data association in SfM-related 3D computer vision
problems.
First, we propose a learning-based algorithm for the efficient and accurate association of
similar images that depict the same scene, which often serves as the first step in a large-scale 3D
reconstruction to accelerate the later image matching pipeline. Though Convolutional Neural
Networks (CNNs) have achieved superior performance on object image retrieval, Bag-of-Words
(BoW) models with handcrafted local features still dominate the retrieval of overlapping images
in 3D reconstruction. We narrow down this gap by presenting an efficient CNN-based method to retrieve images with overlaps, which we refer to as the matchable image retrieval problem.
We propose a batched triplet-based loss function combined with mesh re-projection to learn
the CNN representation effectively. The proposed method significantly accelerates the image
retrieval process in 3D reconstruction and outperforms the state-of-the-art CNN-based and BoW
methods for matchable image retrieval.
Based on the pairwise image matching, we present a match graph construction method that
tackles the issues of completeness, efficiency, and consistency in a unified framework. Pairwise
image matching of unordered image collections greatly affects the efficiency and accuracy of
SfM. Insufficient match pairs may result in disconnected structures or incomplete components,
while costly redundant pairs containing erroneous ones may lead to folded and superimposed
structures. This approach starts by chaining all but singleton images using a visual-similarity-based
minimum spanning tree. Then the minimum spanning tree is incrementally expanded
to form locally consistent strong triplets. Finally, a global community-based graph algorithm
is introduced to strengthen global consistency by reinforcing potentially dominant connected
components. We demonstrate the superior performance of our method in terms of accuracy and
efficiency on both benchmark and Internet datasets. This method also performs remarkably well
on the challenging datasets of highly ambiguous and duplicated scenes.
The data association problem also widely exists in other domains of 3D reconstruction.
We describe the contributions in two related problems, namely generating consistent textures
in image-based modeling, and estimating relative camera poses via the profound interplay of
photometric and geometric information. The first one shares the same graph structure with the
large-scale SfM problem, while the second combines traditional geometric motion estimation
method with the recent trend of learning-based methods. We bridge the gap between geometric
loss and photometric loss by introducing the matching loss constrained by epipolar geometry
in a self-supervised framework. Evaluated on the KITTI dataset, the method outperforms the
state-of-the-art unsupervised ego-motion estimation methods by a large margin. We conclude
the thesis by laying out future directions of data association with different types of information
sources.
Post a Comment