THESIS
2023
1 online resource (xv, 89 pages) : illustrations (chiefly color)
Abstract
Identifying robust and accurate visual correspondences across images, also known as image
matching, has been a long-standing topic in computer vision research. Particularly, image
matching serves a fundamental step in reconstructing real-world geometry from multi-view
photos, receiving widespread attention from a wide variery of industrial applications, including
metaverse, AR/VR and autonomous driving. Traditionally, image matching involves a series
of discrete steps and hand-crafted algorithms. Although proven effective in general cases, their
manually designed features and matching strategy are often insufficient to cope with challenging
matching scenarios, such as low-texture regions, large perspective changes and low overlap
rate. In this thesis, we are dedicated to improving the a...[
Read more ]
Identifying robust and accurate visual correspondences across images, also known as image
matching, has been a long-standing topic in computer vision research. Particularly, image
matching serves a fundamental step in reconstructing real-world geometry from multi-view
photos, receiving widespread attention from a wide variery of industrial applications, including
metaverse, AR/VR and autonomous driving. Traditionally, image matching involves a series
of discrete steps and hand-crafted algorithms. Although proven effective in general cases, their
manually designed features and matching strategy are often insufficient to cope with challenging
matching scenarios, such as low-texture regions, large perspective changes and low overlap
rate. In this thesis, we are dedicated to improving the accuracy frontier and robustness of image
matching algorithms, particularly through the utilization of deep learning techniques.
We first propose a graph neural network (GNN), which inherits traditional keypoint-based
matching scheme, to regularize matching cost through reasoning about visual similarity and
matching consensus. Specifically, to avoid exhaustive interaction among image keypoints, we
leverage a small set of pre-seleceted relatively reliable matches, referred to as seed matches, to
guide matching of a whole keypoint set. By integrating seed matches with a series of efficient
attentive operations, we prove that even a very limited set of seeds could provide strong clues
to assist matching of other keypoints. Through comprehensive experiments, we demonstrate
that our approach achieves competitive performance compared with state-of-the-art GNN-based matcher while maintaining modest computational costs.
Jumping out of keypoint-based matching, we then presenet an end-to-end Transformer-based
matcher that directly works on raw image pairs and skip the step of keypoint detection.
To tackle the quadratic complexity caused by dense operation for vanilla transformer, we propose
a global-local attention framework to ensure both global long-range interaction and local
fine-level interaction. Specially, instead of setting local attention span as a fixed size, we adjust
it according to learned matching uncertainty, which balances matching coverage and interaction
granularity in an adaptive way. Through comprehensive evaluation, we prove that our designed
attention framework significantly improve the quality of obtained matches and boosts the accuracy
of camera pose estimation. Particularly, we outperform our counterparts that also adopt
efficient Transformer design by a large margin.
Finally, taking one step further from our previous work, we propose a geometry-aware deformable
attention to enhance local attention in Transformer-based matcher. Towards better
modeling of ubiquitous local deformation caused by view-point changes, we estimate patch-wise
parametric deformation filed from intermediate matching results, which are used to shape
local attention pattern. Through this design, we embed deformation priors into the process of
matching in a principled and intuitive manner. Experiments show that our design considerably
improves the effectiveness of global-local attention framework and produces high quality visual
correspondences for geometry estimation task.
With intensive investigation and innovation, we aspire to further advance the performance
of image matching for geometric estimation tasks and empower a wider range of 2D and 3D
applications.
Post a Comment