Starting Point Selection and Multiple-standard Matching for Video Object Segmentation with Language Annotation

IEEE Transactions on Multimedia(2022)

引用 0|浏览2
暂无评分
摘要
In this paper, we study the language-level video object segmentation where the first-frame language annotation is provided to describe the target object. By taking full advantage of the characteristic that a language label is normally compatible to all frames in a video, the proposed method can choose the most suitable starting frame to mitigate the initialization failure issue. Moreover, apart from extracting the visual feature from a static video frame, a motion-language score based on optical flow is proposed to better represent moving objects. Ultimately, scores of multiple standards are aggregated using an attention-based mechanism to predict the final result. The proposed method is evaluated on four widely-used video object segmentation datasets, including DAVIS 2017, DAVIS 2016, SegTrack V2 and YoutubeObject datasets, and the new state-of-the-art accuracy (mean region similarity) is obtained on both DAVIS 2017 (67.2%) and DAVIS 2016 (83.5%) datasets. Source code will be published together with the paper.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要