SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

2023 IEEE/CVF International Conference on Computer Vision (ICCV)(2022)

引用 41|浏览29
暂无评分
摘要
Click-based interactive image segmentation aims at extracting objects with limited user clicking. Hierarchical backbone is the de-facto architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for the downstream task without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive segmentation. To fill this gap, we propose the first plain-backbone method, termed as SimpleClick due to its simplicity in architecture, for interactive segmentation. With the plain backbone pretrained as masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance without bells and whistles. Remarkably, our method achieves 4.15 NoC@90 on SBD, improving 21.8% over previous best result. Extensive evaluation of medical images highlights the generalizability of our method. We also provide a detailed computation analysis for our method, highlighting its availability as a practical annotation tool.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要