N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
arxiv(2024)
摘要
Understanding complex scenes at multiple levels of abstraction remains a
formidable challenge in computer vision. To address this, we introduce Nested
Neural Feature Fields (N2F2), a novel approach that employs hierarchical
supervision to learn a single feature field, wherein different dimensions
within the same high-dimensional feature encode scene properties at varying
granularities. Our method allows for a flexible definition of hierarchies,
tailored to either the physical dimensions or semantics or both, thereby
enabling a comprehensive and nuanced understanding of scenes. We leverage a 2D
class-agnostic segmentation model to provide semantically meaningful pixel
groupings at arbitrary scales in the image space, and query the CLIP
vision-encoder to obtain language-aligned embeddings for each of these
segments. Our proposed hierarchical supervision method then assigns different
nested dimensions of the feature field to distill the CLIP embeddings using
deferred volumetric rendering at varying physical scales, creating a
coarse-to-fine representation. Extensive experiments show that our approach
outperforms the state-of-the-art feature field distillation methods on tasks
such as open-vocabulary 3D segmentation and localization, demonstrating the
effectiveness of the learned nested feature field.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要