THESIS
2020
xv, 111 pages : illustrations ; 30 cm
Abstract
3D object detection and tracking play a significant role for autonomous driving vehicles
where the time-independent detection undertakes the fundamental perception, and
continuous object tracking further enables temporal motion prediction and planning. In
this thesis, we aim to push the limit of image-based 3D object estimation ability step by
step by fully exploiting different levels of visual information.
We start with the ego-motion tracking problem by proposing a tightly-coupled visual-inertial
state estimator with loop closure ability, which can be used for the autonomous
robot navigation and augmented reality. Followed by its natural extension for object
estimation in autonomous driving scenarios, where we combining object-level semantic
prior with our dynamic object bund...[
Read more ]
3D object detection and tracking play a significant role for autonomous driving vehicles
where the time-independent detection undertakes the fundamental perception, and
continuous object tracking further enables temporal motion prediction and planning. In
this thesis, we aim to push the limit of image-based 3D object estimation ability step by
step by fully exploiting different levels of visual information.
We start with the ego-motion tracking problem by proposing a tightly-coupled visual-inertial
state estimator with loop closure ability, which can be used for the autonomous
robot navigation and augmented reality. Followed by its natural extension for object
estimation in autonomous driving scenarios, where we combining object-level semantic
prior with our dynamic object bundle adjustment (BA) using sparse feature correspondences
geometry, and obtain 3D object pose, velocity and anchored dynamic point cloud
estimation with instance accuracy and temporal consistency. To complement the insufficiency of sparse feature representation in handling small or largely occluded objects,
we design a Stereo R-CNN network to detect associated objects in stereo images and
predict the corresponding object properties (keypoint, dimensions, etc), coarse 3D object
bounding boxes are then calculated using this object-level information. We then recover
the accurate 3D bounding box by refining the object disparity using a dense photometric
alignment in left and right RoIs. The sub-pixel level object disparity estimation enables
our method outperforms all existing fully supervised image-based methods while does not
require depth input and 3D position supervision.
Based on the proposed temporal object geometric modeling and dense photometric
alignment, we further integrate them into an elegant 3D object tracking framework that
handles simultaneous detection & association via learned correspondences, and solves
continuous estimation by fully exploiting exploit dense spatial-temporal constraints in
sequential stereo images. Extensive experiments on the KITTI dataset shows our approach
outperforms previous image-based methods by significant margins and achieve a new
state-of-the-art.
Post a Comment