THESIS
2015
xx, 116 pages : illustrations ; 30 cm
Abstract
Beyond various audio applications, video applications, such as video conferencing and video Internet sharing, are also becoming an integral part of people’s daily lives. As a result, many new challenges have been brought out by the increasing demands of these video applications. To deal with these challenges, several advanced coding techniques are proposed in this thesis. More specifically, we place emphasis on the multi-view stream switching problem, real-time video encoding problem, hardware-friendly algorithm design problem and subjective quality improvement problem.
First, we address the problem of multi-view stream switching in video transmission. In this problem, a client switches freely in real-time among a number of pre-encoded streams for personalized consumption. At the switc...[
Read more ]
Beyond various audio applications, video applications, such as video conferencing and video Internet sharing, are also becoming an integral part of people’s daily lives. As a result, many new challenges have been brought out by the increasing demands of these video applications. To deal with these challenges, several advanced coding techniques are proposed in this thesis. More specifically, we place emphasis on the multi-view stream switching problem, real-time video encoding problem, hardware-friendly algorithm design problem and subjective quality improvement problem.
First, we address the problem of multi-view stream switching in video transmission. In this problem, a client switches freely in real-time among a number of pre-encoded streams for personalized consumption. At the switching point, each stream can provide a distinct reference frame and we can freely switch from one stream to another stream. As a result, there will be multiple reconstructed frames predicted from different reference frames, which will potentially cause an error-drifting problem or storage problem. To have identical reconstructed frame at the switching point, we propose an additional merge frame, which merges the different reconstructed frames to an identical “merge” frame. Two methods are presented, the first method merges the reconstructed frames into a target frame identically, while the second method merges the reconstructed frames into the target frame in a rate-distortion optimized manner. Experimental results show impressive coding gains over other related coding techniques.
Next, we address the problem of real-time video encoding. Sub-pixel motion estimation (ME) is a very complex coding tool in video coding due to its complex interpolation operation and Hadamard transform. Based on the observation that the error surface of the sub-pixel ME area is smooth and convex, we model the error surface by simple polynomial models. A total of four algorithms are proposed in this thesis, two fast sub-pixel ME algorithms and two interpolation-free (IF) sub-pixel ME algorithms. Experimental results show that the two fast sub-pixel ME algorithms effectively reduce the sub-pixel ME complexity while preserving the coding performance, and the two IF sub-pixel ME algorithms improve the coding performance compared to other IF algorithms.
Then, we address the problem of hardware-friendly algorithm design. During the video encoding process, neighboring information is frequently used to predict the current block due to its high spatial correlation. However, using neighboring information introduces a severe data dependency problem, which is not desirable for hardware implementation. In this thesis, we focus on a hardware-friendly algorithm design for merge/skip mode, which is the one of the most efficient tools in video coding. We propose two reconfigurable methods to construct the merge candidate list for merge mode. Experimental results show that these two methods achieve a better balance between the coding performance and hardware parallel capability compared to other methods.
Finally, we address the problem of subjective quality improvement in video coding. Instead of using PSNR as distortion metric, we propose to use the structural similarity (SSIM) index as the distortion metric to improve the subjective experience. By modeling the relationship between the original frame and the reconstructed frame, SSIM is approximated by a scaling of the sum of squared difference (SSD). Therefore, the traditional SSD-based video coding system can be easily modified to an SSIM-based video coding system. Experimental results show that the proposed SSIM-based framework outperforms other SSIM-based frameworks, especially in low bit-rate region.
Post a Comment