SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
arxiv(2024)
摘要
This paper introduces SceneCraft, a Large Language Model (LLM) Agent
converting text descriptions into Blender-executable Python scripts which
render complex scenes with up to a hundred 3D assets. This process requires
complex spatial planning and arrangement. We tackle these challenges through a
combination of advanced abstraction, strategic planning, and library learning.
SceneCraft first models a scene graph as a blueprint, detailing the spatial
relationships among assets in the scene. SceneCraft then writes Python scripts
based on this graph, translating relationships into numerical constraints for
asset layout. Next, SceneCraft leverages the perceptual strengths of
vision-language foundation models like GPT-V to analyze rendered images and
iteratively refine the scene. On top of this process, SceneCraft features a
library learning mechanism that compiles common script functions into a
reusable library, facilitating continuous self-improvement without expensive
LLM parameter tuning. Our evaluation demonstrates that SceneCraft surpasses
existing LLM-based agents in rendering complex scenes, as shown by its
adherence to constraints and favorable human assessments. We also showcase the
broader application potential of SceneCraft by reconstructing detailed 3D
scenes from the Sintel movie and guiding a video generative model with
generated scenes as intermediary control signal.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要