Explainability of Speech Recognition Transformers via Gradient-Based Attention Visualization

Tianli Sun,Haonan Chen,Guosheng Hu,Lianghua He,Cairong Zhao

IEEE TRANSACTIONS ON MULTIMEDIA（2024）

引用 0|浏览2

暂无评分

摘要

In vision Transformers, attention visualization methods are used to generate heatmaps highlighting the class-corresponding areas in input images, which offers explanations on how the models make predictions. However, it is not so applicable for explaining automatic speech recognition (ASR) Transformers. An ASR Transformer makes a particular prediction for every input token to form a sentence, but a vision Transformer only makes an overall classification for the input data. Therefore, traditional attention visualization methods may fail in ASR Transformers. In this work, we propose a novel attention visualization method in ASR Transformers and try to explain which frames of the audio result in the output text. Inspired by the model explainability, we also explore ways of improving the effectiveness of the ASR model. Comparing with other Transformer attention visualization methods, our method is more efficient and intuitively understandable, which unravels the attention calculation from information flow of Transformer attention modules. In addition, we demonstrate the utilization of visualization result in three ways: (1) We visualize attention with respect to connectionist temporal classification (CTC) loss to train an ASR model with adversarial attention erasing regularization, which effectively decreases the word error rate (WER) of the model and improves its generalization capability. (2) We visualize the attention on some specific words, interpreting the model by effectively demonstrating the semantic and grammar relationships between these words. (3) Similarly, we analyze how the model manage to distinguish homophones, using contrastive explanation with respect to homophones.

查看译文

关键词

Transformers,Analytical models,Visualization,Predictive models,Data models,Computational modeling,Training,Explainability,transformer,speech recognition,attention visualization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要