RGBD based 3D geometric modeling and semantic understanding

HKUST Electronic Theses

RGBD based 3D geometric modeling and semantic understanding

by Lei Han

THESIS 2020

Ph.D. Electronic and Computer Engineering

xv, 161 pages : illustrations ; 30 cm

Abstract

3D scene perception, with a variety of applications in robotics and augmented reality, is in large demands for both academia and industry, yet still in its early stage and suffers from robustness and scalability. This Ph.D. thesis focuses on the fundamental problems in 3D scene perception, i.e., scalable geometric modeling and semantic understanding of real world environments.

Firstly, we identify that globally consistent pose estimation of cameras are critical for robust 3D modeling. While exhaustive search of all previous observations are infeasible for large scale datasets, Loop Closure Detection (LCD) has been proved to be extremely useful to achieve global consistency of visual observations. Examining state-of-the-art methods that gained a lot of popularity for their efficiency yet suffer from low recall due to the inherent drawback that high dimensional binary feature descriptors lack well-defined centroids, we propose a real-time LCD approach called MILD (Multi-Index Hashing for Loop closure Detection), in which image similarity is measured by feature matching directly to achieve high recall without introducing extra computational complexity with the aid of the Multi-Index Hashing (MIH) technique. A robust globally consistent pose estimation approach GCSLAM is further introduced to minimize the registration error of visual observations collected from MILD, achieving state-of-the-art accuracy while maintaining high efficiency based on our proposed FastGO technique for Fast Globally Consistent Point Cloud Registration. Based on the globally camera pose estimation, we present FlashFusion with real-time dense 3D reconstruction on portable devices for AR/VR applications.

Moreover, we demonstrate that semantic understanding of the environment serves as the key component for scalable representation of 3D environments by simplifying independent point clouds into meaningful objects. Unlike images that are represented by densely organized pixels in 2D space, 3D scene normally employs unordered point clouds for representation, making it a tough problem to use convolutional neural networks for 3D scene understanding. For efficient and robust semantic understanding of 3D environments, we propose a chunk based spatially-sparse convolution scheme based on the insight that points are continuous as 2D surfaces in 3D space that is 4x faster than previous state-of-the-arts. A novel occupancy signal is introduced for robust instance level semantic understanding of 3D environments that achieves state-of-the-art accuracy on public datasets. Finally, we present a building-scale dense 3D reconstruction system with room-level loop closure detector relying on the proposed semantic understanding approaches of the reconstructed 3D model.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Electronic and Computer Engineering Supervisors Shi, Ling Authors Han, Lei Subjects Visual perception Mathematical models Three-dimensional imaging Three-dimensional modeling Hashing (Computer science) Mobile robots Automatic control Language English Call number Thesis ECED 2020 Han DOI 10.14711/thesis-991012879762703412

Full record

RGBD based 3D geometric modeling and semantic understanding

by Lei Han

Post a Comment Cancel reply