THESIS
2023
1 online resource (xiv, 100 pages) : illustrations (chiefly color)
Abstract
The increasing demand for high-quality visual content, including 2D images and 3D
models, is evident across various applications, such as virtual reality and video games.
Recently, a plethora of deep generative models has enabled the creation of visual content
on an unprecedented scale and at remarkable speed. Nonetheless, to achieve the generation
under different control conditions, e.g., editing attribute, sketch, and text prompt,
collecting corresponding large-scale training data is challenging due to copyright, privacy,
and collection costs. The limited availability of data and computing resources can
then hamper the generation quality.
This thesis aims at exploring a new generative paradigm that leverages well-trained
foundation generative models to boost visual content creation. W...[
Read more ]
The increasing demand for high-quality visual content, including 2D images and 3D
models, is evident across various applications, such as virtual reality and video games.
Recently, a plethora of deep generative models has enabled the creation of visual content
on an unprecedented scale and at remarkable speed. Nonetheless, to achieve the generation
under different control conditions, e.g., editing attribute, sketch, and text prompt,
collecting corresponding large-scale training data is challenging due to copyright, privacy,
and collection costs. The limited availability of data and computing resources can
then hamper the generation quality.
This thesis aims at exploring a new generative paradigm that leverages well-trained
foundation generative models to boost visual content creation. We begin this thesis with
high-fidelity face image editing, where we embed real images to the latent space of well-trained
generative adversarial networks (GAN). Our framework allows for various attribute
editing within a unified model, while preserving image-specific details such as
background and illumination.
Next, we move on to the controllable generation of general images beyond faces. Rather than using GANs that mainly work for specific domains (e.g., faces), we opt to
the diffusion models that emerge to show impressive expressivity in synthesizing complex
and general images. With pretraining, we proposed a unified architecture to boost
various kinds of image-to-image translation tasks.
Besides 2D images, we also extend this pretraining philosophy to 3D content creation.
We propose a 3D generative model that uses diffusion model to automatically generate 3D
avatars represented as neural radiance fields. Building upon this foundational generative
model for avatars, we also demonstrate 3D avatar creation from an image or a text prompt
while allowing for text-based semantic editability.
Post a Comment