THESIS
2013
xiii, 65 p. : ill. ; 30 cm
Abstract
The High Efficiency Video Coding (HEVC) is the next generation video coding standard beyond H.264/AVC. As a newly established standard mainly targeted for the efficient compression of high-definition (HD) video and ultra-HD
contents, HEVC adopts many new techniques and more elaborate coding tools
such as nested quad-tree structure, advanced motion vector prediction (AMVP),
asymmetric motion partition (AMP), etc. In HEVC, the basic processing unit
– coding tree unit (CTU) has a larger size of 64 × 64 luma samples and adopts
a nested quad-tree based partitioning scheme which supports a hierarchy up to
4, compared to merely 16 × 16 macroblock (MB) and a limited choice of block
sizes in H.264/AVC. Also, HEVC has 35 intra prediction modes (IPM) while
H.264/AVC has only up to 9 modes...[
Read more ]
The High Efficiency Video Coding (HEVC) is the next generation video coding standard beyond H.264/AVC. As a newly established standard mainly targeted for the efficient compression of high-definition (HD) video and ultra-HD
contents, HEVC adopts many new techniques and more elaborate coding tools
such as nested quad-tree structure, advanced motion vector prediction (AMVP),
asymmetric motion partition (AMP), etc. In HEVC, the basic processing unit
– coding tree unit (CTU) has a larger size of 64 × 64 luma samples and adopts
a nested quad-tree based partitioning scheme which supports a hierarchy up to
4, compared to merely 16 × 16 macroblock (MB) and a limited choice of block
sizes in H.264/AVC. Also, HEVC has 35 intra prediction modes (IPM) while
H.264/AVC has only up to 9 modes for intra prediction. Although these new
techniques contribute a lot to the coding efficiency gain of the HEVC encoder,
they add significant complexity to the encoder. Thus it is of great importance to
study fast algorithms to make it more practical in real applications.
To speed up the HEVC encoder, we propose two fast algorithms in this thesis.
The first fast algorithm is called fast PU quad-tree depth decision (FPDD) algorithm and speeds up the encoding process by skipping less probable PU sizes. It
is achieved by making use of the inherent correlation of PU quad-tree structure
between current CTU and its spatial and temporal neighbors. To reduce error
propagation, we also propose a confidence grading scheme to prevent CTUs with
bad prediction from being referred to by others. Results show that FPDD algorithm provides averagely 20.0% (up to 39.3%) encoding time reduction whilst
causing negligible RD performance loss (0.2% BD-Rate increase on average) compared with HM 7.0.
The second one is called fast rough mode decision (FRMD) algorithm which
further reduces the number of IPM for full RDO process. In FRMD, we analyzed the costs generated by rough mode decision (RMD), which has already
been incorporated in the HM software. We found that the RMD costs listed
by mode number generally follow the same trend with the rate-distortion optimization (RDO) costs. Further, the local salient modes, whose RMD costs have
a significant drop compared with adjacent modes, tend to be promising competitors for the optimal mode. Based on these observations, we further reduced
the number of the candidates for the RDO process. Experimental results show
that FRMD algorithm achieves averagely 19.0% (up to 33.6%) encoding time
saving whilst causing negligible RD performance loss (0.4% BD-Rate increase on
average) compared with HM 7.0 anchor.
Apart from the above video technology research, the thesis also contains my
recent work on natural image matting which is a classical topic but has aroused
more and more attention recently. Basically, natural image matting refers to
the problem of extracting regions of interest such as foreground object from an
image based on user inputs like scribbles or trimap. More specifically, we need to
estimate the color information of background, foreground and the corresponding
opacity, which is an ill-posed problem inherently. Inspired by closed-form matting
and KNN matting, we extend the local color line model which is based on the
assumption of linear color clustering within a small local window, to nonlocal
feature space neighborhood. A novel nonlocal color ball model is eventually
introduced. With the proposed model, we capitalize on the nonlocal principle
which gathers pixels with similar appearance from the whole image. New affinity
matrix is defined to achieve better clustering which ensures better prediction of
alpha matte. Finally, a closed-form solution to the matting problem is achieved by
solving a sparse linear system. Experimental evaluations on benchmark datasets
and comparisons show that our results are of higher accuracy and better visual
quality than some state-of-the-art matting algorithms.
Post a Comment