THESIS
2020
1 online resource (xii, 121 pages) : illustrations (some color)
Abstract
As the amounts of images and videos are now growing explosively, there is a constant
hunger for higher quality video compression algorithms. Inspired by the great success of
neural network in many disciplines, in this thesis, I focus on improving the efficiency of
video coding based on neural network. First, a quality enhancement network for versatile
video coding (VVC) is proposed, which consists of a temporal fusion subnet and a spatial
detail enhancement subnet to jointly explore effiective prior information. Second, I
present a robust multi-frame guided attention network (MGANet) for HEVC compressed
videos. The combination of an advanced motion flow algorithm and a temporal encoder
greatly improves its ability to explore temporal information. In addition, the partition
information o...[
Read more ]
As the amounts of images and videos are now growing explosively, there is a constant
hunger for higher quality video compression algorithms. Inspired by the great success of
neural network in many disciplines, in this thesis, I focus on improving the efficiency of
video coding based on neural network. First, a quality enhancement network for versatile
video coding (VVC) is proposed, which consists of a temporal fusion subnet and a spatial
detail enhancement subnet to jointly explore effiective prior information. Second, I
present a robust multi-frame guided attention network (MGANet) for HEVC compressed
videos. The combination of an advanced motion flow algorithm and a temporal encoder
greatly improves its ability to explore temporal information. In addition, the partition
information of transform unit is employed to guide the network to focus on the coding
block boundary. Third, a guided attention generative network (GAGNet) is first proposed
to generate high-quality frames. Then, based on GAGNet, an adversarial network is designed
to generate higher quality reconstructed videos, and the generator is trained by
adding a generative loss term to recover more high-frequency information of compressed
videos. Fourth, I develop an end-to-end learned image compression framework with large
capacity and low redundancy of latent representation (LICLL). Two novel enhancement
modules are designed to enhance the R-D performance of the proposed network. Simulation
results demonstrate that these proposed methods achieve excellent performance.
In particular, compared with the state-of-the-art methods, MGANet achieves 0.6002 dB
and 1.0934 dB gains under AI and LDP configurations, respectively. In addition, LICLL
achieves more than 1 dB coding gain on the widely used high-resolution test image set.
Post a Comment