Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models
CoRR(2024)
摘要
To build intelligent machine learning systems, there are two broad
approaches. One approach is to build inherently interpretable models, as
endeavored by the growing field of causal representation learning. The other
approach is to build highly-performant foundation models and then invest
efforts into understanding how they work. In this work, we relate these two
approaches and study how to learn human-interpretable concepts from data.
Weaving together ideas from both fields, we formally define a notion of
concepts and show that they can be provably recovered from diverse data.
Experiments on synthetic data and large language models show the utility of our
unified approach.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要