SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model

Zhengang Li, Yan Kang,Yuchen Liu,Difan Liu,Tobias Hinz,Feng Liu,Yanzhi Wang

CVPR 2024（2024）

Cited 0|Views8

No score

Abstract

While AI-generated content has garnered significant attention, achievingphoto-realistic video synthesis remains a formidable challenge. Despite thepromising advances in diffusion models for video generation quality, thecomplex model architecture and substantial computational demands for bothtraining and inference create a significant gap between these models andreal-world applications. This paper presents SNED, a superposition networkarchitecture search method for efficient video diffusion model. Our methodemploys a supernet training paradigm that targets various model cost andresolution options using a weight-sharing method. Moreover, we propose thesupernet training sampling warm-up for fast training optimization. To showcasethe flexibility of our method, we conduct experiments involving bothpixel-space and latent-space video diffusion models. The results demonstratethat our framework consistently produces comparable results across differentmodel options with high efficiency. According to the experiment for thepixel-space video diffusion model, we can achieve consistent video generationresults simultaneously across 64 x 64 to 256 x 256 resolutions with a largerange of model sizes from 640M to 1.6B number of parameters for pixel-spacevideo diffusion models.

Translated text

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined