Customizing Text-to-Image Diffusion with Camera Viewpoint Control
arxiv(2024)
摘要
Model customization introduces new concepts to existing text-to-image models,
enabling the generation of the new concept in novel contexts. However, such
methods lack accurate camera view control w.r.t the object, and users must
resort to prompt engineering (e.g., adding "top-view") to achieve coarse view
control. In this work, we introduce a new task – enabling explicit control of
camera viewpoint for model customization. This allows us to modify object
properties amongst various background scenes via text prompts, all while
incorporating the target camera pose as additional control. This new task
presents significant challenges in merging a 3D representation from the
multi-view images of the new concept with a general, 2D text-to-image model. To
bridge this gap, we propose to condition the 2D diffusion process on rendered,
view-dependent features of the new object. During training, we jointly adapt
the 2D diffusion modules and 3D feature predictions to reconstruct the object's
appearance and geometry while reducing overfitting to the input multi-view
images. Our method outperforms existing image editing and model personalization
baselines in preserving the custom object's identity while following the input
text prompt and the object's camera pose.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要