THESIS
2022
1 online resource (xviii, 136 pages) : illustrations (some color)
Abstract
In past decades, the 3D point cloud data has been one of the most important data
types across a variety of domains, such as robotics, architecture, and especially in the
autonomous driving assistance system (ADAS) because of its robustness and spatial accuracy.
To better understand point cloud data in the big data era, machine learning-based
approaches gradually occupied a dominant position. Thus, in this thesis, I focus on the
point cloud sequences’ spatial and temporal feature extraction and processing in learning-based
methods.
In ADS, point clouds will be used widely in perception, localization, and mapping.
And at the beginning, point cloud collection, storage, and transmission are the predecessor
of those downstream tasks. This thesis explores several feature extraction networks i...[
Read more ]
In past decades, the 3D point cloud data has been one of the most important data
types across a variety of domains, such as robotics, architecture, and especially in the
autonomous driving assistance system (ADAS) because of its robustness and spatial accuracy.
To better understand point cloud data in the big data era, machine learning-based
approaches gradually occupied a dominant position. Thus, in this thesis, I focus on the
point cloud sequences’ spatial and temporal feature extraction and processing in learning-based
methods.
In ADS, point clouds will be used widely in perception, localization, and mapping.
And at the beginning, point cloud collection, storage, and transmission are the predecessor
of those downstream tasks. This thesis explores several feature extraction networks in two
different directions: point cloud compression and multiple object detection and tracking.
The perception task can be seen as one of the downstream tasks of the data compression.
In end-to-end point cloud compression, I first propose a baseline range image-based
method to prove that the range image-based compression framework is better than the
octree-based methods for scanning LiDARs in autonomous driving. Then, motivated by
video compression, I introduce a hybrid point cloud sequence compression framework,
which consists of a static and a dynamic learning-based point cloud compression algorithm.
In the static compression framework, a geometry-aware attention layer helps remove
spatial redundancy. In the dynamic compression framework, the conv-LSTM with
GHU module is used for temporal redundancy removal. And in the downstream task, 3D multiple object detection and tracking, I first propose a ”fake” end-to-end tracking-with-detection
framework by predicting the objects’ movement to improve the data association
accuracy. Then I introduce a ”real” end-to-end MOT network, ST-TrackNet, which rearranges
the object detections in a Spatio-temporal map and then directly predicts the
object track ID without the data association step. Based on the above research, I propose
DiTNet, which integrates a detection module with the tracking network. The features
from the detection module help to improve the tracking performance, and the tracking
module with final trajectories also helps to refine the detection results. Lastly, I summarize
this thesis and propose future research opportunities.
Post a Comment