SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution
CoRR(2024)
摘要
Diffusion-based super-resolution (SR) models have recently garnered
significant attention due to their potent restoration capabilities. But
conventional diffusion models perform noise sampling from a single
distribution, constraining their ability to handle real-world scenes and
complex textures across semantic regions. With the success of segment anything
model (SAM), generating sufficiently fine-grained region masks can enhance the
detail recovery of diffusion-based SR model. However, directly integrating SAM
into SR models will result in much higher computational cost. In this paper, we
propose the SAM-DiffSR model, which can utilize the fine-grained structure
information from SAM in the process of sampling noise to improve the image
quality without additional computational cost during inference. In the process
of training, we encode structural position information into the segmentation
mask from SAM. Then the encoded mask is integrated into the forward diffusion
process by modulating it to the sampled noise. This adjustment allows us to
independently adapt the noise mean within each corresponding segmentation area.
The diffusion model is trained to estimate this modulated noise. Crucially, our
proposed framework does NOT change the reverse diffusion process and does NOT
require SAM at inference. Experimental results demonstrate the effectiveness of
our proposed method, showcasing superior performance in suppressing artifacts,
and surpassing existing diffusion-based methods by 0.74 dB at the maximum in
terms of PSNR on DIV2K dataset. The code and dataset are available at
https://github.com/lose4578/SAM-DiffSR.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要