THESIS
2022
1 online resource (xv, 111 pages) : color illustrations
Abstract
3D semantic segmentation is an indispensable cornerstone for thorough 3D scene understanding,
and it faces different challenges in indoor and outdoor scenes due to their
respective characteristics. In indoor scenes, objects are densely placed and have various
structures, which brings two major challenges for 3D semantic segmentation: 1. how to
generate accurate and clear segmentation boundaries; 2. how to extract surface information
from complex and irregular geometries. Compared with indoor scenes, outdoor
scenes have much larger scanning ranges perceiving millions of points, posing a fundamental
question for 3D semantic segmentation: how to effectively label outdoor scene
datasets. This thesis presents three methods that aim to address the above challenges.
To address the first challe...[
Read more ]
3D semantic segmentation is an indispensable cornerstone for thorough 3D scene understanding,
and it faces different challenges in indoor and outdoor scenes due to their
respective characteristics. In indoor scenes, objects are densely placed and have various
structures, which brings two major challenges for 3D semantic segmentation: 1. how to
generate accurate and clear segmentation boundaries; 2. how to extract surface information
from complex and irregular geometries. Compared with indoor scenes, outdoor
scenes have much larger scanning ranges perceiving millions of points, posing a fundamental
question for 3D semantic segmentation: how to effectively label outdoor scene
datasets. This thesis presents three methods that aim to address the above challenges.
To address the first challenge in indoor scenes, we introduce the task of semantic edge
detection to the 3D field. It serves as the dual task of 3D semantic segmentation and focuses
on the segmentation boundaries. We adopt the idea of complementary learning and
present JSENet, a novel joint learning framework that brings significant improvements to
the segmentation boundaries of indoor scenes by explicitly exploiting the duality between
the two tasks. Further, to address the second challenge of extracting surface information from complex and irregular geometries of objects in indoor scenes, we adopt the often-overlooked
mesh representation in which valuable geodesic information of geometric surfaces
is naturally embedded. We propose VMNet, a novel deep architecture that operates
on voxel and mesh representations simultaneously. By leveraging both the Euclidean information
embedded in voxels and the geodesic information embedded in meshes, for
indoor scenes, we develop a geodesic-aware 3D semantic segmentation method that generates
accurate segmentation results on complex geometries. Finally, to address the third
challenge in outdoor scenes, we study the task of label-efficient 3D semantic segmentation.
Outdoor scenes are generally captured as continuous LiDAR frame sequences containing
a large number of points that are expensive to label and informatively redundant. We
propose to utilize inter-frame correlation to tackle the information redundancy problem in
these LiDAR frames. By estimating model uncertainty based on the inconsistency of predictions
across these continuous frames, we design LiDAL, a novel active learning strategy
for 3D LiDAR semantic segmentation of outdoor scenes, which significantly reduces label
annotation costs. We conduct extensive experiments to demonstrate the effectiveness of
our methods.
Post a Comment