THESIS
2019
xviii, 111, that is, xx, 113 pages : illustrations ; 30 cm
Abstract
Interactive understanding of the 3D real world has been a hot topic, yet a frontier
for both academia and industry, due to the inherent high requirements for effective and
efficient sensation and perception. Regardless of the emerging of multimodal sensors,
the restricted sensation in terms of spatial, temporal, angular, spectral, multimodal,
degraded, and dynamic information makes 3D scene perception challenging. This
Ph.D. thesis focuses on the fundamental problem of 3D scene understanding, i.e., 3D
surface perception, by learning from the undetermined sensation through sight and
touch, including geometric reconstruction from sparse views, scene recovery behind
scattering, and material identification through multimodal fusion.
Firstly, exploiting the observations from sparse...[
Read more ]
Interactive understanding of the 3D real world has been a hot topic, yet a frontier
for both academia and industry, due to the inherent high requirements for effective and
efficient sensation and perception. Regardless of the emerging of multimodal sensors,
the restricted sensation in terms of spatial, temporal, angular, spectral, multimodal,
degraded, and dynamic information makes 3D scene perception challenging. This
Ph.D. thesis focuses on the fundamental problem of 3D scene understanding, i.e., 3D
surface perception, by learning from the undetermined sensation through sight and
touch, including geometric reconstruction from sparse views, scene recovery behind
scattering, and material identification through multimodal fusion.
Firstly, exploiting the observations from sparse views, SurfaceNet, the very first
end-to-end learning framework for multiview stereopsis (MVS), directly learn photo-consistency
and precisely extract the geometric structure. This work inspired subsequent
learning-based MVS algorithms that led and rekindle the MVS community,
which includes our next version, called SurfaceNet+, which takes advantage of the
sparsity of the 3D surface and markedly improves both the model completeness and
the complexity for training and inference with more than 7x speedup.
Moreover, sensory degradation widely exists in real-world scenarios by scattering
medium, such as fog, frosted glass, biological tissue and opaque obstacles. Therefore,
seeing through scattering with limited temporal resolution is intensively demanded by
the 3D surface perception system. For example, precisely extracting the vascular structure
is valuable for clinical diagnosis. Due to the lack of labeled data, a generic unsupervised
domain adversarial network is proposed to extract vasculature for subsequent
in vivo disease diagnosis.
Lastly, in the process of the 3D surface perception, the fusion of multimodal sensation is demanded for comprehensive understanding of the 3D scene. For example, as
a supplementary modality of the contactless visual sensor, the contact haptic sensation
is crucial to analysis the surface material from different aspects. Compared with the
haptic information encoding the material-invariant sub-surface statistics, the color images
focus more on the material-irrelevant texture pattern. In order to adaptively fuse
multimodal data, a learning framework is discussed and shows great potential.
Post a Comment