Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation
CoRR(2024)
摘要
To fully leverage the capabilities of mobile manipulation robots, it is
imperative that they are able to autonomously execute long-horizon tasks in
large unexplored environments. While large language models (LLMs) have shown
emergent reasoning skills on arbitrary tasks, existing work primarily
concentrates on explored environments, typically focusing on either navigation
or manipulation tasks in isolation. In this work, we propose MoMa-LLM, a novel
approach that grounds language models within structured representations derived
from open-vocabulary scene graphs, dynamically updated as the environment is
explored. We tightly interleave these representations with an object-centric
action space. The resulting approach is zero-shot, open-vocabulary, and readily
extendable to a spectrum of mobile manipulation and household robotic tasks. We
demonstrate the effectiveness of MoMa-LLM in a novel semantic interactive
search task in large realistic indoor environments. In extensive experiments in
both simulation and the real world, we show substantially improved search
efficiency compared to conventional baselines and state-of-the-art approaches,
as well as its applicability to more abstract tasks. We make the code publicly
available at http://moma-llm.cs.uni-freiburg.de.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要