Learning rigid object pose estimation

HKUST Electronic Theses

Learning rigid object pose estimation

by Yisheng He

THESIS 2022

Ph.D. Computer Science and Engineering

1 online resource (xvii, 89 pages) : illustrations (chiefly color)

Abstract

Rigid object pose estimation aims to predict the target object’s orientation, position, and size. It is a significant component of various real-world applications, including but not limited to robotic manipulation, augmented reality, and autonomous driving. Recently, the rapid development of deep learning techniques has inspired various learning-based approaches to promote rigid object pose estimation. In this thesis, we advance learning-based rigid object pose estimation in three aspects: improving the pose estimation accuracy, enhancing the network generalizability, and eliminating the reliance on manual labels.

First, we improve the accuracy of learning-based object pose estimation by enhancing the two main sub-modules, the representation learning backbone for feature extraction from RGBD inputs and the subsequent output representation for pose estimation. For representation learning, we introduce a full-flow bidirectional fusion network to combine the complementary information residing in the RGB and depth images. Features with rich semantic and geometric information are extracted for precise regression of different downstream tasks. For output representation, we introduce a 3D-keypoint-based algorithm by joint instance semantic segmentation and 3D keypoint detection. Then, the pose parameters are estimated within a least-squares fitting manner. Our 3D-keypoint-based formulation fully leverages the geometric constraint of the rigid object and is easy for a network to learn and optimize.

Second, we enhance the generalizability of pose estimation algorithms by eliminating the close-set assumption and their reliance on high-fidelity object CAD models. We study a few-shot open-set 6D pose estimation problem, which aims to estimate the pose of unknown objects given only a few support views. We propose a large-scale photorealistic dataset (ShapeNet6D) for network pre-training and introduce a dense prototype matching network to tackle the pose estimation problem. We also establish a benchmark to facilitate future research on this new challenging problem.

Finally, to eliminate the reliance on time- and labor-consuming manual labels, we propose a self-supervised framework for category-level object pose and size estimation. Specifically, we propose a label-free method that learns to enforce the geometric consistency between the category template mesh and observed object point cloud under a self-supervision manner. Given the category template mesh and the observed scene object point cloud, we propose to leverage differentiable shape deformation, registration, and rendering to enforce geometric consistency for self-supervision.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Chen, Qifeng Authors He, Yisheng Subjects Pattern perception Computer vision Pattern recognition systems Data processing Computer graphics Language English Call number Thesis CSE 2022 HeY DOI 10.14711/thesis-991013142456703412

Full record

Learning rigid object pose estimation

by Yisheng He

Post a Comment Cancel reply