Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
arxiv(2024)
摘要
The goal of interactive image segmentation is to delineate specific regions
within an image via visual or language prompts. Low-latency and high-quality
interactive segmentation with diverse prompts remain challenging for existing
specialist and generalist models. Specialist models, with their limited prompts
and task-specific designs, experience high latency because the image must be
recomputed every time the prompt is updated, due to the joint encoding of image
and visual prompts. Generalist models, exemplified by the Segment Anything
Model (SAM), have recently excelled in prompt diversity and efficiency, lifting
image segmentation to the foundation model era. However, for high-quality
segmentations, SAM still lags behind state-of-the-art specialist models despite
SAM being trained with x100 more segmentation masks. In this work, we delve
deep into the architectural differences between the two types of models. We
observe that dense representation and fusion of visual prompts are the key
design choices contributing to the high segmentation quality of specialist
models. In light of this, we reintroduce this dense design into the generalist
models, to facilitate the development of generalist models with high
segmentation quality. To densely represent diverse visual prompts, we propose
to use a dense map to capture five types: clicks, boxes, polygons, scribbles,
and masks. Thus, we propose SegNext, a next-generation interactive segmentation
approach offering low latency, high quality, and diverse prompt support. Our
method outperforms current state-of-the-art methods on HQSeg-44K and DAVIS,
both quantitatively and qualitatively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要