Video Super-Resolution Transformer with Masked Inter Intra-Frame Attention
CoRR(2024)
摘要
Recently, Vision Transformer has achieved great success in recovering missing
details in low-resolution sequences, i.e., the video super-resolution (VSR)
task.Despite its superiority in VSR accuracy, the heavy computational burden as
well as the large memory footprint hinder the deployment of Transformer-based
VSR models on constrained devices.In this paper, we address the above issue by
proposing a novel feature-level masked processing framework: VSR with Masked
Intra and inter frame Attention (MIA-VSR).The core of MIA-VSR is leveraging
feature-level temporal continuity between adjacent frames to reduce redundant
computations and make more rational use of previously enhanced SR features.
Concretely, we propose an intra-frame and inter-frame attention block which
takes the respective roles of past features and input features into
consideration and only exploits previously enhanced features to provide
supplementary information. In addition, an adaptive block-wise mask prediction
module is developed to skip unimportant computations according to feature
similarity between adjacent frames. We conduct detailed ablation studies to
validate our contributions and compare the proposed method with recent
state-of-the-art VSR approaches. The experimental results demonstrate that
MIA-VSR improves the memory and computation efficiency over state-of-the-art
methods, without trading off PSNR accuracy. The code is available at
https://github.com/LabShuHangGU/MIA-VSR.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要