Automata Extraction from Transformers
arxiv(2024)
摘要
In modern machine (ML) learning systems, Transformer-based architectures have
achieved milestone success across a broad spectrum of tasks, yet understanding
their operational mechanisms remains an open problem. To improve the
transparency of ML systems, automata extraction methods, which interpret
stateful ML models as automata typically through formal languages, have proven
effective for explaining the mechanism of recurrent neural networks (RNNs).
However, few works have been applied to this paradigm to Transformer models. In
particular, understanding their processing of formal languages and identifying
their limitations in this area remains unexplored. In this paper, we propose an
automata extraction algorithm specifically designed for Transformer models.
Treating the Transformer model as a black-box system, we track the model
through the transformation process of their internal latent representations
during their operations, and then use classical pedagogical approaches like L*
algorithm to interpret them as deterministic finite-state automata (DFA).
Overall, our study reveals how the Transformer model comprehends the structure
of formal languages, which not only enhances the interpretability of the
Transformer-based ML systems but also marks a crucial step toward a deeper
understanding of how ML systems process formal languages. Code and data are
available at https://github.com/Zhang-Yihao/Transfomer2DFA.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要