THESIS
2014
xviii, 131 pages : illustrations ; 30 cm
Abstract
Multiview video plus depth (MVD) is currently the most popular and widely
accepted photo-realistic representation of a 3D scene. With the advent of consumer-level
depth capturing sensors, 3D information such as dense depth maps can now
be acquired cost-effectively from multiple viewpoints. A depth map constitutes a
projection of the 3D geometry in the scene to a 2D image/video of fixed resolution.
In this thesis, we explore advanced algorithms for improved representation
and seamless rendering based on MVD.
Firstly, as depth maps need to be denoised and compressed at the encoder
for improved representation and efficient network transmission to the decoder, we
consider the denoising and compression problems jointly, arguing that doing so
will result in a better overall performa...[
Read more ]
Multiview video plus depth (MVD) is currently the most popular and widely
accepted photo-realistic representation of a 3D scene. With the advent of consumer-level
depth capturing sensors, 3D information such as dense depth maps can now
be acquired cost-effectively from multiple viewpoints. A depth map constitutes a
projection of the 3D geometry in the scene to a 2D image/video of fixed resolution.
In this thesis, we explore advanced algorithms for improved representation
and seamless rendering based on MVD.
Firstly, as depth maps need to be denoised and compressed at the encoder
for improved representation and efficient network transmission to the decoder, we
consider the denoising and compression problems jointly, arguing that doing so
will result in a better overall performance than the alternative of solving the two
problems separately in two stages. Specifically, we formulate a rate-constrained
estimation problem, where given a set of observed noise corrupted depth maps,
the most probable (maximum a posteriori (MAP)) 3D surface is sought within
a search space of surfaces with representation size no larger than a pre-specified
rate constraint.
Secondly, we work on view synthesis which is one of the typical rendering
tasks. In view synthesis, we need to render new views of a scene, starting from
a number of images taken from given point of views. This is often called Depth-Image-based Rendering (DIBR) when the 3D geometry is explicitly known as
depth map. Particularly, for the problem of synthesizing from stereo images, we apply geometry compensation and reliability-based blending to reasonably integrate
the stereo views therefore reducing the visual artifacts. For the problem of
synthesizing from mono-image, which is more challenging, we present a novel optimization
approach named Visto, which uses one image plus one depth to synthesize
seamless (natural and visually pleasing) virtual views in nearby viewpoints.
Visto addresses common challenges in DIBR including inaccurate depth map,
occlusions, disocclusions (or holes), ringing artifacts, unnaturally sharp edges,
etc., in an integral manner by formulating the view synthesis problem as a joint
optimization of inter-view texture and depth map similarity.
Finally, we address the temporal consistency problem when synthesizing videos.
While the virtual images at nearby viewpoints can be synthesized using DIBR
algorithms, directly extending such algorithms from images to videos - by synthesizing
each frame independently - would not produce a visually pleasing result in
general. A very typical problem would be the lack of content consistency across
frames. To address this problem, we extend our previous DIBR algorithm, Visto,
by regularizing the similarity across consecutive video frames based on a global
motion assumption. Additionally, by doing this, the ill-posed view synthesis
problem is alleviated as neighboring frames could provide potentially more useful
information for each other.
Post a Comment