Video object segmentation through semantic visual words matching

MULTIMEDIA TOOLS AND APPLICATIONS(2023)

引用 0|浏览11
暂无评分
摘要
Video object segmentation (VOS) has been widely used in the fields of computer vision. However, existing VOS algorithms have drawbacks, such as difficulty with object deformation, occlusion, and fast motion. We therefore propose an effective VOS algorithm based on semantic visual words matching. Specifically, given the support frame and its corresponding mask, the frame is firstly input to the encoder with an embedding layer, and then a clustering algorithm is followed to generate a group of semantic visual words according to its mask. For a query frame to be segmented, a matching operation is performed against words generated from the support frame. In this manner, each pixel on query frame can be classified into different object categories by the obtained similarity. What’s more, a self-attention mechanism is applied to enhance the embedding features in order to capture the global dependencies before the words matching. For further handling the object changing and global mismatch problems, an online update and correction mechanism are also employed in our method. Experiments show that our proposed method achieved competitive results on the DAVIS 2016 and DAVIS 2017 datasets. J & F - m e a n , the mean value between regional similarity and contour accuracy, reached 83.2% and 72.3% on DAVIS 2016 and DAVIS 2017, respectively.
更多
查看译文
关键词
Video object segmentation,Clustering algorithm,Visual words,Self-attention,Online update mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要