THESIS
2019
Abstract
Video super-resolution (VSR) means generating a high-resolution (HR) video from its low-resolution
(LR) counterpart. Convolutional neural network (CNN) models have been recently
shown to be promising for VSR. However, previous models assume knowledge of
the degradation operation from HR to the LR version of the test video, and are based on
supervised learning, i.e., training on LR/HR pairs synthesized by artificial degradation
operations. When the degradation operation is not available, the performance is limited.
Moreover, previous approaches generate HR frames independently, leading to poor temporal
consistency in the form of flickering artifacts.
We propose VistGAN, an unsupervised video super-resolution with temporal consistency
using Generative Adversarial Network architec...[
Read more ]
Video super-resolution (VSR) means generating a high-resolution (HR) video from its low-resolution
(LR) counterpart. Convolutional neural network (CNN) models have been recently
shown to be promising for VSR. However, previous models assume knowledge of
the degradation operation from HR to the LR version of the test video, and are based on
supervised learning, i.e., training on LR/HR pairs synthesized by artificial degradation
operations. When the degradation operation is not available, the performance is limited.
Moreover, previous approaches generate HR frames independently, leading to poor temporal
consistency in the form of flickering artifacts.
We propose VistGAN, an unsupervised video super-resolution with temporal consistency
using Generative Adversarial Network architecture without assuming any degradation
operation. VistGAN is an encoder-decoder architecture. The encoder degrades
the HR training video to the LR version in an unsupervised way using GAN. Using our
designed metric learning as the discriminator, the features of the LR version match well
with the test video. To achieve temporal consistency in the HR domain, the decoder seeks
to recover the HR training sequence from the LR frames using a frame-recurrent scheme based on high-resolution optical flow, the current test frame, and previously generated
super-resolved frames. After the training period, the test video is then super-resolved
only using the decoder. We conduct extensive experiments on benchmark datasets. As
compared with state-of-the-art schemes, VistGAN achieves much better performance in
terms of temporal consistency (cutting the warping error about 12.6%) and PSNR (improving
up to 1.02dB).
Post a Comment