THESIS
2023
1 online resource (xiv, 100 pages) : color illustrations
Abstract
The generation of videos is a fundamental problem in the field of computer vision. The ability to
capture, understand and reproduce the dynamics of our visual world is very important. Apart from
the research interests, video synthesis has an expansive range of fields such as computer vision,
computer graphics, and robotics. Video Synthesis has two main research directions: conditional
video synthesis and unconditional video synthesis.
For conditional video synthesis, videos are synthesized conditioned on previous frames, semantic
segmentation, edges, and texts. Among all these conditions, we focus on the task of conditioning
on previous frames. Video synthesis conditioning on previous frames requires a comprehensive
knowledge of the scope of the video and the motion and interactions bet...[
Read more ]
The generation of videos is a fundamental problem in the field of computer vision. The ability to
capture, understand and reproduce the dynamics of our visual world is very important. Apart from
the research interests, video synthesis has an expansive range of fields such as computer vision,
computer graphics, and robotics. Video Synthesis has two main research directions: conditional
video synthesis and unconditional video synthesis.
For conditional video synthesis, videos are synthesized conditioned on previous frames, semantic
segmentation, edges, and texts. Among all these conditions, we focus on the task of conditioning
on previous frames. Video synthesis conditioning on previous frames requires a comprehensive
knowledge of the scope of the video and the motion and interactions between objects. Previous
methods fail to produce videos with accurate motions and long-term videos.
Our first work addressed the motion understanding problem in the video synthesis task. We
propose FVS. Instead of understanding the whole scene as one component, our framework is
designed to decompose a scene into a background that includes the regions does not contains
self-motion and moving objects. The appearance change of static background is because of camera
movement, and that of moving objects consists of ego-motion and camera movement. Decomposing
the understanding of the content increases the accuracy of video synthesis.
Although in the first work, we explore how to improve the video synthesis quality based on past
frames. Decomposing the scene into different components requires the requirement of semantic
segmentation, which is a strong assumption and restricts the application of video synthesis algorithms.
Thus, we introduce an optimization framework utilizing a video frame interpolation pre-trained
model. In this work, we do not require additional assumptions about the scene and do not require
external training, which makes the video synthesis framework applicable to all the scenes and obtain
high quality.
Besides conditional video synthesis, we further explore unconditional video synthesis, which
represents generating novel, non-existing videos without any specific input. Concretely, we study
the 3D-aware generative models for video avatar generation and animation tasks. We propose
AniFaceGAN, an animatable 3D-aware face generation method which is used to synthesize highly
realistic face images with controllable pose and expression sequences. We introduce a 3D parametric
face model as priors and use a 3D deformation field to leverage desired expression change. Also, we
introduce a set of 3D losses to enforce our deformation field to imitate it under expression variations.
Our method can generate realistic face videos with high visual quality.
Post a Comment