THESIS
2020
xvi, 136 pages : illustrations ; 30 cm
Abstract
Convenient and high-quality 4D reconstruction of human activities is critical for immersive VR/AR experience, but existing solutions suffer from the inherent constraint of structured capture setting, i.e., relying on fixed capture volume or specific chromatic background, tedious pre-calibration and synchronization, or extra manual labor. This Ph.D. thesis focuses on combining dynamic scene reconstruction with various unstructured sensors, which brings huge potential for convenient and high-quality human activity modeling with easy deployment for endless daily applications.
Firstly, aiming at active dynamic 4D reconstruction in a non-instrusive manner, the autonomous flying camera (autonomous unmanned aerial vehicles (UAV) each integrated with an RGBD video camera) is adopted. To this e...[
Read more ]
Convenient and high-quality 4D reconstruction of human activities is critical for immersive VR/AR experience, but existing solutions suffer from the inherent constraint of structured capture setting, i.e., relying on fixed capture volume or specific chromatic background, tedious pre-calibration and synchronization, or extra manual labor. This Ph.D. thesis focuses on combining dynamic scene reconstruction with various unstructured sensors, which brings huge potential for convenient and high-quality human activity modeling with easy deployment for endless daily applications.
Firstly, aiming at active dynamic 4D reconstruction in a non-instrusive manner, the autonomous flying camera (autonomous unmanned aerial vehicles (UAV) each integrated with an RGBD video camera) is adopted. To this end, we propose a new generation performance capture system, FlyCap, which can automatically reconstruct detailed time varying surface geometry of an moving target in general apparel in a wide space using multiple autonomous flying cameras, without resorting to user intervention or manual operation, or any markers in the scene. To further open and explore the problem of active dynamic scene reconstruction, we propose FlyFusion, the first system for active and realtime dynamic scene reconstruction with adaptive viewpoint selection based on a single flying camera. FlyFusion succeeds to not only remove the constraints of fixed recording volume, user comfortability, human labor and expertise, but also enable intelligent viewpoint selection based on the immediate dynamic reconstruction result. The merit of FlyFusion lies in its concurrent robustness, efficiency, and adaptation in producing fused and denoised 3D geometry and motions of a moving target interacting with different non-rigid objects in a large space.
Moreover, for robust dynamic 4D reconstruction, we propose UnstructuredFusion, a practicable realtime dynamic reconstruction method using unstructured commercial RGBD cameras. A flexible hardware setup is adopted using three unstructured RGBD cameras without any tedious pre-calibration or synchronization. Our key idea is to utilize the human motion as the proper anchor to handle the spatial and temporal unstructured input. Extensive experiments such as allocating three cameras flexibly in a handheld way demonstrate that the proposed method achieves high-quality 4D geometry and texture reconstruction without tiresome pre-calibration, liberating the cumbersome hardware and software restrictions in conventional structured multi-camera system, while eliminating the inherent occlusion issues of the single camera setup.
Lastly, for high-speed dynamic 4D reconstruction to model the temporal unstructured information, the bio-inspired event camera is adopted, which asynchronously measure per-pixel intensity changes at a high temporal resolution. To this end, we propose EventCap, the first approach for 3D capturing of high-speed human motions using a single event camera, which removes the constraints of lighting requirement, the high data bandwidth and the consequent high computation overhead of existing solutions. To tackle the challenges of sparse measurements, subtle inter-frame motion and low signal-to-noise ratio (SNR), our method combines model-based optimization and CNN-based human pose detection to capture high frequency motion details and to reduce the drifting in the tracking. As a result, we can capture fast motions at millisecond resolution with significantly higher data efficiency in challenging lighting conditions.
Post a Comment