Efficient and accurate data association in large-scale structure-from-motion and beyond

HKUST Electronic Theses

Efficient and accurate data association in large-scale structure-from-motion and beyond

by Tianwei Shen

THESIS 2019

Ph.D. Computer Science and Engineering

xiii, 103 pages : illustrations ; 30 cm

Abstract

Data association, in the context of Structure-from-Motion (SfM) and Simultaneous Localization and Mapping (SLAM), is the process of associating uncertain measurements (e.g., image pixels, local descriptors, and 3D tracks) to the same object or identity. It forms the foundation of many 3D computer vision problems, starting from finding local feature correspondences, identifying similar images with overlaps, up to bundle adjustment and related graph-based optimization problems that seek to achieve a harmonious status in terms of geometric and photometric quantities. Unlike deterministic pose estimation algorithms that typically have closed-form solutions, data association usually works in a noisy setting and does not possess an analytical form. It dramatically affects the efficiency and accuracy of the reconstruction. This thesis explores the elements of the data association problem in the context of 3D reconstruction and related issues. More specifically, we first give a thorough overview of the state-of-the-art SfM pipeline, with a focus on the functionality of data association in each of its sub-steps. Then we describe three novel methods to solve the data association in SfM-related 3D computer vision problems.

First, we propose a learning-based algorithm for the efficient and accurate association of similar images that depict the same scene, which often serves as the first step in a large-scale 3D reconstruction to accelerate the later image matching pipeline. Though Convolutional Neural Networks (CNNs) have achieved superior performance on object image retrieval, Bag-of-Words (BoW) models with handcrafted local features still dominate the retrieval of overlapping images in 3D reconstruction. We narrow down this gap by presenting an efficient CNN-based method to retrieve images with overlaps, which we refer to as the matchable image retrieval problem. We propose a batched triplet-based loss function combined with mesh re-projection to learn the CNN representation effectively. The proposed method significantly accelerates the image retrieval process in 3D reconstruction and outperforms the state-of-the-art CNN-based and BoW methods for matchable image retrieval.

Based on the pairwise image matching, we present a match graph construction method that tackles the issues of completeness, efficiency, and consistency in a unified framework. Pairwise image matching of unordered image collections greatly affects the efficiency and accuracy of SfM. Insufficient match pairs may result in disconnected structures or incomplete components, while costly redundant pairs containing erroneous ones may lead to folded and superimposed structures. This approach starts by chaining all but singleton images using a visual-similarity-based minimum spanning tree. Then the minimum spanning tree is incrementally expanded to form locally consistent strong triplets. Finally, a global community-based graph algorithm is introduced to strengthen global consistency by reinforcing potentially dominant connected components. We demonstrate the superior performance of our method in terms of accuracy and efficiency on both benchmark and Internet datasets. This method also performs remarkably well on the challenging datasets of highly ambiguous and duplicated scenes.

The data association problem also widely exists in other domains of 3D reconstruction. We describe the contributions in two related problems, namely generating consistent textures in image-based modeling, and estimating relative camera poses via the profound interplay of photometric and geometric information. The first one shares the same graph structure with the large-scale SfM problem, while the second combines traditional geometric motion estimation method with the recent trend of learning-based methods. We bridge the gap between geometric loss and photometric loss by introducing the matching loss constrained by epipolar geometry in a self-supervised framework. Evaluated on the KITTI dataset, the method outperforms the state-of-the-art unsupervised ego-motion estimation methods by a large margin. We conclude the thesis by laying out future directions of data association with different types of information sources.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Quan, Long Authors Shen, Tianwei Subjects Image processing Data processing Photogrammetry Computer vision Language English Call number Thesis CSED 2019 Shen DOI 10.14711/thesis-991012730762303412

Full record

Efficient and accurate data association in large-scale structure-from-motion and beyond

by Tianwei Shen

Post a Comment Cancel reply