THESIS
2017
xix, 132 pages : illustrations ; 30 cm
Abstract
3-D imaging has achieved significant development in the recent years, thanks
to the fast-growing technologies in video capturing, data compression and view
synthesis. One of the core task of 3-D imaging is the interactive 3-D navigation,
which enables the users to interactively navigate in the 3-D scene instead of
watching the content in a fixed FOV (field-of-view) determined by the media
producer. Building such an interactive navigation system requires to consider a
complete processing chain including 3-D scene representation, data compression
and transmission, and (virtual) view synthesis. It should be noticed that, a
proper 3-D scene representation is important to the entire system as it influences
the following processing modules. Image plus depth representation is currentl...[
Read more ]
3-D imaging has achieved significant development in the recent years, thanks
to the fast-growing technologies in video capturing, data compression and view
synthesis. One of the core task of 3-D imaging is the interactive 3-D navigation,
which enables the users to interactively navigate in the 3-D scene instead of
watching the content in a fixed FOV (field-of-view) determined by the media
producer. Building such an interactive navigation system requires to consider a
complete processing chain including 3-D scene representation, data compression
and transmission, and (virtual) view synthesis. It should be noticed that, a
proper 3-D scene representation is important to the entire system as it influences
the following processing modules. Image plus depth representation is currently
the most popular and widely used photo-realistic representation for the 3-D scene.
The depth map captures a 2-D projection of the 3-D geometry of the scene. With
the help of the depth information, it is much easier to reconstruct a virtual view
using DIBR (depth-image-based rendering) techniques. In this thesis, we study
the practical solutions for the interactive 3-D navigation based on the image plus
depth representation.
Firstly, we conduct our research on the acquisition and compression of depth
maps, because the depth map has different characteristics compared to the natural
images. On the acquisition aspect, we study the depth estimation from a stereo
image pair, which is the classical stereo matching problem in computer vision.
We propose a convex approach to the discrete multi-labeling problem of stereo
matching by reformulating it into a quadratic programming problem. On the
compression aspect, we propose a novel distortion metric for depth maps in order
to replace the conventional SSE (sum-of-squared error) metric, because the depth
distortion affects the quality of synthesized views in a different way compared to
the image distortion.
Next, we move on to the problem of interactive 3-D navigation. In order
to provide sufficient navigation range for the users to explore in the 3-D scene,
we use multiview images plus depth maps to capture a wider FOV. The growing
amount of image and depth data captured by multiview cameras brings challenges
in data storage and compression. The state-of-the-art 3-D video compression
technology is able to efficiently compress the image and depth data, but at the
cost of degradation in navigation
flexibility. We propose to organize the multiview
data as navigation segments that can be decoded/reconstructed independently
from the rest of the data. Navigation
flexibility can be adjusted by adjusting the
number and size of the navigation segments.
Based on the proposed navigation segments, we further study practical solutions
to the interactive navigation problem based on 1-D and 2-D navigation
segments respectively. In both cases, we consider an end-to-end navigation system
including data representation, compression, transmission and view synthesis,
and propose an optimization framework based on our novel rate and distortion
models. We further investigate practical solving methods for the 1-D and 2-D
cases respectively in order to derive the optimal navigation segments that achieve
the best trade-offs between various navigation criteria like resource consumptions,
viewing quality and decoding complexity.
Post a Comment