Task-conditioned adaptation of visual features in multi-task policy learning
CoRR(2024)
摘要
Successfully addressing a wide variety of tasks is a core ability of
autonomous agents, which requires flexibly adapting the underlying
decision-making strategies and, as we argue in this work, also adapting the
underlying perception modules. An analogical argument would be the human visual
system, which uses top-down signals to focus attention determined by the
current task. Similarly, in this work, we adapt pre-trained large vision models
conditioned on specific downstream tasks in the context of multi-task policy
learning. We introduce task-conditioned adapters that do not require finetuning
any pre-trained weights, combined with a single policy trained with behavior
cloning and capable of addressing multiple tasks. We condition the policy and
visual adapters on task embeddings, which can be selected at inference if the
task is known, or alternatively inferred from a set of example demonstrations.
To this end, we propose a new optimization-based estimator. We evaluate the
method on a wide variety of tasks of the CortexBench benchmark and show that,
compared to existing work, it can be addressed with a single policy. In
particular, we demonstrate that adapting visual features is a key design choice
and that the method generalizes to unseen tasks given visual demonstrations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要