FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization
arxiv(2024)
摘要
Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies
directly without access to any known normal or abnormal samples within the
target item categories. Existing approaches typically rely on the robust
generalization capabilities of multimodal pretrained models, computing
similarities between manually crafted textual features representing "normal" or
"abnormal" semantics and image features to detect anomalies and localize
anomalous patches. However, the generic descriptions of "abnormal" often fail
to precisely match diverse types of anomalies across different object
categories. Additionally, computing feature similarities for single patches
struggles to pinpoint specific locations of anomalies with various sizes and
scales. To address these issues, we propose a novel ZSAD method called FiLo,
comprising two components: adaptively learned Fine-Grained Description (FG-Des)
and position-enhanced High-Quality Localization (HQ-Loc). FG-Des introduces
fine-grained anomaly descriptions for each category using Large Language Models
(LLMs) and employs adaptively learned textual templates to enhance the accuracy
and interpretability of anomaly detection. HQ-Loc, utilizing Grounding DINO for
preliminary localization, position-enhanced text prompts, and Multi-scale
Multi-shape Cross-modal Interaction (MMCI) module, facilitates more accurate
localization of anomalies of different sizes and shapes. Experimental results
on datasets like MVTec and VisA demonstrate that FiLo significantly improves
the performance of ZSAD in both detection and localization, achieving
state-of-the-art performance with an image-level AUC of 83.9
AUC of 95.9
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要