Do Concept Bottleneck Models Obey Locality?
CoRR(2024)
摘要
Concept-based learning improves a deep learning model's interpretability by
explaining its predictions via human-understandable concepts. Deep learning
models trained under this paradigm heavily rely on the assumption that neural
networks can learn to predict the presence or absence of a given concept
independently of other concepts. Recent work, however, strongly suggests that
this assumption may fail to hold in Concept Bottleneck Models (CBMs), a
quintessential family of concept-based interpretable architectures. In this
paper, we investigate whether CBMs correctly capture the degree of conditional
independence across concepts when such concepts are localised both spatially,
by having their values entirely defined by a fixed subset of features, and
semantically, by having their values correlated with only a fixed subset of
predefined concepts. To understand locality, we analyse how changes to features
outside of a concept's spatial or semantic locality impact concept predictions.
Our results suggest that even in well-defined scenarios where the presence of a
concept is localised to a fixed feature subspace, or whose semantics are
correlated to a small subset of other concepts, CBMs fail to learn this
locality. These results cast doubt upon the quality of concept representations
learnt by CBMs and strongly suggest that concept-based explanations may be
fragile to changes outside their localities.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要