Starting Point Selection and Multiple-standard Matching for Video Object Segmentation with Language Annotation

Mingjie Sun,Jimin Xiao,Eng GEE Lim,Yao Zhao

IEEE Transactions on Multimedia（2022）

引用 0|浏览2

暂无评分

摘要

In this paper, we study the language-level video object segmentation where the first-frame language annotation is provided to describe the target object. By taking full advantage of the characteristic that a language label is normally compatible to all frames in a video, the proposed method can choose the most suitable starting frame to mitigate the initialization failure issue. Moreover, apart from extracting the visual feature from a static video frame, a motion-language score based on optical flow is proposed to better represent moving objects. Ultimately, scores of multiple standards are aggregated using an attention-based mechanism to predict the final result. The proposed method is evaluated on four widely-used video object segmentation datasets, including DAVIS 2017, DAVIS 2016, SegTrack V2 and YoutubeObject datasets, and the new state-of-the-art accuracy (mean region similarity) is obtained on both DAVIS 2017 (67.2%) and DAVIS 2016 (83.5%) datasets. Source code will be published together with the paper.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要