State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend
arxiv(2024)
摘要
Stochastic video prediction enables the consideration of uncertainty in
future motion, thereby providing a better reflection of the dynamic nature of
the environment. Stochastic video prediction methods based on image
auto-regressive recurrent models need to feed their predictions back into the
latent space. Conversely, the state-space models, which decouple frame
synthesis and temporal prediction, proves to be more efficient. However,
inferring long-term temporal information about motion and generalizing to
dynamic scenarios under non-stationary assumptions remains an unresolved
challenge. In this paper, we propose a state-space decomposition stochastic
video prediction model that decomposes the overall video frame generation into
deterministic appearance prediction and stochastic motion prediction. Through
adaptive decomposition, the model's generalization capability to dynamic
scenarios is enhanced. In the context of motion prediction, obtaining a prior
on the long-term trend of future motion is crucial. Thus, in the stochastic
motion prediction branch, we infer the long-term motion trend from conditional
frames to guide the generation of future frames that exhibit high consistency
with the conditional frames. Experimental results demonstrate that our model
outperforms baselines on multiple datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要