Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression
IEEE Transactions on Circuits and Systems for Video Technology(2024)
摘要
Video compression performance is closely related to the accuracy of inter
prediction. It tends to be difficult to obtain accurate inter prediction for
the local video regions with inconsistent motion and occlusion. Traditional
video coding standards propose various technologies to handle motion
inconsistency and occlusion, such as recursive partitions, geometric
partitions, and long-term references. However, existing learned video
compression schemes focus on obtaining an overall minimized prediction error
averaged over all regions while ignoring the motion inconsistency and occlusion
in local regions. In this paper, we propose a spatial decomposition and
temporal fusion based inter prediction for learned video compression. To handle
motion inconsistency, we propose to decompose the video into structure and
detail (SDD) components first. Then we perform SDD-based motion estimation and
SDD-based temporal context mining for the structure and detail components to
generate short-term temporal contexts. To handle occlusion, we propose to
propagate long-term temporal contexts by recurrently accumulating the temporal
information of each historical reference feature and fuse them with short-term
temporal contexts. With the SDD-based motion model and long short-term temporal
contexts fusion, our proposed learned video codec can obtain more accurate
inter prediction. Comprehensive experimental results demonstrate that our codec
outperforms the reference software of H.266/VVC on all common test datasets for
both PSNR and MS-SSIM.
更多查看译文
关键词
Inter prediction,learned video compression,spatial decomposition,occlusion,temporal fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要