THESIS
2024
1 online resource (xiv, 87 pages) : color illustrations
Abstract
We are witnessing developments in artificial intelligence (AI), where the most relevant to us are embodied AI systems with robotic agents interacting with human users in real-world environments. To serve in the embodied AI system, the agent robots need visual intelligence, namely the three-dimensional (3D) perception of the surrounding environment. This thesis is dedicated to enabling real-time low-cost edge deployment of visual perception which is advocated by us as the future solution for embodied AI applications.
First, an end-to-end deep neural network pipeline for machinery visual perception, CDRNet, is proposed. It jointly perceives a 3D scene's geometry structure and semantic labels. While conventional volumetric approaches for 3D perception tend to focus on the global coherence...[
Read more ]
We are witnessing developments in artificial intelligence (AI), where the most relevant to us are embodied AI systems with robotic agents interacting with human users in real-world environments. To serve in the embodied AI system, the agent robots need visual intelligence, namely the three-dimensional (3D) perception of the surrounding environment. This thesis is dedicated to enabling real-time low-cost edge deployment of visual perception which is advocated by us as the future solution for embodied AI applications.
First, an end-to-end deep neural network pipeline for machinery visual perception, CDRNet, is proposed. It jointly perceives a 3D scene's geometry structure and semantic labels. While conventional volumetric approaches for 3D perception tend to focus on the global coherence of their reconstructions, which leads to a lack of local geometric detail, CDRNet leverages the latent geometric prior knowledge in 2D image features by explicit depth prediction and anchored feature generation, to refine the occupancy learning in TSDF volume.
Besides, we find that this cross-dimensional feature refinement methodology can also be adopted for the semantic segmentation task by utilizing semantic priors, to extract both 3D mesh and 3D semantic labeling in real-time. Beyond public datasets for testing, we further implement a real-time messaging system to support these aforementioned perception tasks in real-life scenarios.
Finally, a software-hardware co-optimization system, Efficient-Grad is proposed to the online AI model fine-tuning. It improves both throughput and energy saving with negligible accuracy degradation during model training for deep convolutional neural networks, by utilizing sparsity and asymmetry residing in the gradients for conventional back propagation. Furthermore, the dedicated hardware architecture for sparsity utilization and efficient data movement is optimized to support the Efficient-Grad algorithm in a scalable manner, which leads to its superiority in terms of energy efficiency.
Post a Comment