谷歌浏览器插件
订阅小程序
在清言上使用

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

arXiv (Cornell University)(2024)

引用 0|浏览41
暂无评分
摘要
We introduce InteractiveVideo, a user-centric framework for videogeneration. Different from traditional generative approaches that operate basedon user-provided images or text, our framework is designed for dynamicinteraction, allowing users to instruct the generative model through variousintuitive mechanisms during the whole generation process, e.g. text and imageprompts, painting, drag-and-drop, etc. We propose a Synergistic MultimodalInstruction mechanism, designed to seamlessly integrate users' multimodalinstructions into generative models, thus facilitating a cooperative andresponsive interaction between user inputs and the generative process. Thisapproach enables iterative and fine-grained refinement of the generation resultthrough precise and effective user instructions. WithInteractiveVideo, users are given the flexibility to meticulouslytailor key aspects of a video. They can paint the reference image, editsemantics, and adjust video motions until their requirements are fully met.Code, models, and demo are available athttps://github.com/invictus717/InteractiveVideo
更多
查看译文
关键词
Interactive Television,User Interaction,Multimedia Synchronization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要