THESIS
2022
1 online resource (xvii, 89 pages) : illustrations (chiefly color)
Abstract
Rigid object pose estimation aims to predict the target object’s orientation, position, and
size. It is a significant component of various real-world applications, including but not limited
to robotic manipulation, augmented reality, and autonomous driving. Recently, the rapid
development of deep learning techniques has inspired various learning-based approaches to promote
rigid object pose estimation. In this thesis, we advance learning-based rigid object pose
estimation in three aspects: improving the pose estimation accuracy, enhancing the network
generalizability, and eliminating the reliance on manual labels.
First, we improve the accuracy of learning-based object pose estimation by enhancing the
two main sub-modules, the representation learning backbone for feature extraction from...[
Read more ]
Rigid object pose estimation aims to predict the target object’s orientation, position, and
size. It is a significant component of various real-world applications, including but not limited
to robotic manipulation, augmented reality, and autonomous driving. Recently, the rapid
development of deep learning techniques has inspired various learning-based approaches to promote
rigid object pose estimation. In this thesis, we advance learning-based rigid object pose
estimation in three aspects: improving the pose estimation accuracy, enhancing the network
generalizability, and eliminating the reliance on manual labels.
First, we improve the accuracy of learning-based object pose estimation by enhancing the
two main sub-modules, the representation learning backbone for feature extraction from RGBD
inputs and the subsequent output representation for pose estimation. For representation learning,
we introduce a full-flow bidirectional fusion network to combine the complementary information
residing in the RGB and depth images. Features with rich semantic and geometric
information are extracted for precise regression of different downstream tasks. For output representation,
we introduce a 3D-keypoint-based algorithm by joint instance semantic segmentation
and 3D keypoint detection. Then, the pose parameters are estimated within a least-squares fitting
manner. Our 3D-keypoint-based formulation fully leverages the geometric constraint of
the rigid object and is easy for a network to learn and optimize.
Second, we enhance the generalizability of pose estimation algorithms by eliminating the
close-set assumption and their reliance on high-fidelity object CAD models. We study a few-shot
open-set 6D pose estimation problem, which aims to estimate the pose of unknown objects
given only a few support views. We propose a large-scale photorealistic dataset (ShapeNet6D) for network pre-training and introduce a dense prototype matching network to tackle the pose
estimation problem. We also establish a benchmark to facilitate future research on this new
challenging problem.
Finally, to eliminate the reliance on time- and labor-consuming manual labels, we propose
a self-supervised framework for category-level object pose and size estimation. Specifically,
we propose a label-free method that learns to enforce the geometric consistency between the
category template mesh and observed object point cloud under a self-supervision manner. Given
the category template mesh and the observed scene object point cloud, we propose to leverage
differentiable shape deformation, registration, and rendering to enforce geometric consistency
for self-supervision.
Post a Comment