THESIS
2023
1 online resource (xiii, 90 pages) : illustrations (some color)
Abstract
Matting has long been a primary technique for image/video editing. Traditional matting
methods outlined matting problem and made a preliminary exploration but their
performance is limited by the low-level image feature. This issue has been addressed to
a considerable extent with the introduction of deep neural networks. However, the vigorous
upgrading of the multimedia industry in recent years has posed more challenges,
including diverse media content and application scenarios, commodity-level devices with
limited resources, and the popularity of HD/UHD display screens. To overcome these
challenges, this thesis explores matting task from four different perspectives: accuracy
of image matting, temporal coherence of video matting, efficiency of image and video
matting, and instance-level...[
Read more ]
Matting has long been a primary technique for image/video editing. Traditional matting
methods outlined matting problem and made a preliminary exploration but their
performance is limited by the low-level image feature. This issue has been addressed to
a considerable extent with the introduction of deep neural networks. However, the vigorous
upgrading of the multimedia industry in recent years has posed more challenges,
including diverse media content and application scenarios, commodity-level devices with
limited resources, and the popularity of HD/UHD display screens. To overcome these
challenges, this thesis explores matting task from four different perspectives: accuracy
of image matting, temporal coherence of video matting, efficiency of image and video
matting, and instance-level matting.
The first study improves image matting performance by utilizing semantic information
in alpha mattes. We propose Semantic Image Matting (SIM), which reasons the underlying
causes of matting due to various foreground objects and incorporates semantic
classification of matting regions to obtain better alpha mattes. The method extends the
conventional trimap to semantic trimap, learns a multi-class discriminator to regularize alpha prediction at semantic level, and content-sensitive weights to balance different regularization
losses. The study outperforms other methods, achieving competitive state-of-the-art performance in multiple benchmarks.
The second study proposes a deep learning-based video matting framework (DVM)
that uses a spatio-temporal feature aggregation module (ST-FAM) to address the inherent
technical challenges in reasoning the temporal domain. ST-FAM aligns and aggregates
temporal information in high dimension across multiple frames through deformable convolution
to overcome the unreliability of optical flow estimation within matting regions.
The study also introduces a lightweight trimap propagation network to eliminate frame-by-frame trimap annotations.
The third study proposes SparseMat, a computationally efficient approach for ultra-high
resolution (UHR) image/video matting. It’s infeasible to process UHR images at full
resolution using existing matting algorithms without running out of memory on consumer-level
computational platforms. SparseMat uses spatial and temporal sparsity to address
general UHR matting and reduce computation redundancy. The method generates high-quality
alpha matte for UHR images and videos at the original high resolution in a single
pass.
The last study proposes the new task of instance matting (IM), requiring precise alpha
matte prediction for each instance. To solve instance matting, the study introduces
InstMatt, to tackle technical challenges such as mingled colors and overlapping boundaries.
InstMatt includes a novel mutual guidance strategy and a multi-instance refinement
module to delineate multi-instance relationships. Our InstMatt produces high-quality
instance-level alpha matte and can be adapted to different classes.
Post a Comment