Spatial Constraint for Efficient Semi-Supervised Video Object Segmentation

SSRN Electronic Journal(2023)

引用 0|浏览9
暂无评分
摘要
Semi-supervised video object segmentation is the process of tracking and segmenting objects in a video sequence based on annotated masks for one or more frames. Recently, memory-based methods have attracted a significant amount of attention due to their strong performance. Having too much redundant information stored in memory, however, makes such methods inefficient and inaccurate. Moreover, a global matching strategy is usually used for memory reading, so these methods are susceptible to interference from semantically similar objects and are prone to incorrect segmentation. We propose a spatial constraint network to overcome these problems. In particular, we introduce a time-varying sensor and a dynamic feature memory to adaptively store pixel information to facilitate the modeling of the target object, which greatly reduces information redundancy in the memory without missing critical information. Furthermore, we propose an efficient memory reader that is less computationally intensive and has a smaller footprint. More importantly, we introduce a spatial constraint module to learn spatial consistency to obtain more precise segmentation; the target and distractors can be identified by the learned spatial response. The experimental results indicate that our method is competitive with state-of-the-art methods on several benchmark datasets. Our method also achieves an approximately 30 FPS inference speed, which is close to the requirement for real-time systems.
更多
查看译文
关键词
Video object segmentation,Memory-based methods,Redundant information,Semantically similar objects
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要