LLM meets Vision-Language Models for Zero-Shot One-Class Classification
arxiv(2024)
摘要
We consider the problem of zero-shot one-class visual classification. In this
setting, only the label of the target class is available, and the goal is to
discriminate between positive and negative query samples without requiring any
validation example from the target task. We propose a two-step solution that
first queries large language models for visually confusing objects and then
relies on vision-language pre-trained models (e.g., CLIP) to perform
classification. By adapting large-scale vision benchmarks, we demonstrate the
ability of the proposed method to outperform adapted off-the-shelf alternatives
in this setting. Namely, we propose a realistic benchmark where negative query
samples are drawn from the same original dataset as positive ones, including a
granularity-controlled version of iNaturalist, where negative samples are at a
fixed distance in the taxonomy tree from the positive ones. Our work shows that
it is possible to discriminate between a single category and other semantically
related ones using only its label
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要