THESIS
2023
1 online resource (xii, 38 pages) : color illustrations
Abstract
Since it is challenging for us to acquire per-pixel ground-truth scene depths in real world, it
is significant for researchers to develop self-supervised depth estimation frameworks. In recent
years, self-supervised monocular depth estimation has shown impressive results where networks are
trained to predict depth map for a single image frame by using adjacent frames as supervision signal
during training period. Meanwhile, in many applications, information of video sequences are also
available at test time. Many researchers found that multi-view stereo (MVS) depth estimation based
on cost volume usually works better than monocular schemes except for moving objects and low-textured
surfaces. Based on these facts, we hope to combine advantages of monocular and multi-view
schemes and desig...[
Read more ]
Since it is challenging for us to acquire per-pixel ground-truth scene depths in real world, it
is significant for researchers to develop self-supervised depth estimation frameworks. In recent
years, self-supervised monocular depth estimation has shown impressive results where networks are
trained to predict depth map for a single image frame by using adjacent frames as supervision signal
during training period. Meanwhile, in many applications, information of video sequences are also
available at test time. Many researchers found that multi-view stereo (MVS) depth estimation based
on cost volume usually works better than monocular schemes except for moving objects and low-textured
surfaces. Based on these facts, we hope to combine advantages of monocular and multi-view
schemes and design a new integrated depth estimation framework with better performance.
In this paper, we first introduce several representative self-supervised depth estimation frameworks
in recent years, including monocular and multi-view cases. Besides, to reduce the influence
of observation noises (e.g., occlusion and moving objects), we introduce the concept of Bayesian
uncertainty and explain how to improve the depth accuracy with uncertainty estimation. Then we will propose a multi-frame depth estimation framework where monocular depth map can be refined
continuously by multi-frame sequential constraints, leveraging a Bayesian fusion layer within several
iterations. Both monocular and multi-view networks can be trained with no depth supervision.
Our method also enhances the interpretability when combining monocular estimation with multi-view
cost volume.
Post a Comment