A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation
IEEE Transactions on Knowledge and Data Engineering(2024)
摘要
Recent years have witnessed great success in handling graph-related tasks
with Graph Neural Networks (GNNs). Despite their great academic success,
Multi-Layer Perceptrons (MLPs) remain the primary workhorse for practical
industrial applications. One reason for such an academic-industry gap is the
neighborhood-fetching latency incurred by data dependency in GNNs. To reduce
their gaps, Graph Knowledge Distillation (GKD) is proposed, usually based on a
standard teacher-student architecture, to distill knowledge from a large
teacher GNN into a lightweight student GNN or MLP. However, we found in this
paper that neither teachers nor GNNs are necessary for graph knowledge
distillation. We propose a Teacher-Free Graph Self-Distillation (TGS) framework
that does not require any teacher model or GNNs during both training and
inference. More importantly, the proposed TGS framework is purely based on
MLPs, where structural information is only implicitly used to guide dual
knowledge self-distillation between the target node and its neighborhood. As a
result, TGS enjoys the benefits of graph topology awareness in training but is
free from data dependency in inference. Extensive experiments have shown that
the performance of vanilla MLPs can be greatly improved with dual
self-distillation, e.g., TGS improves over vanilla MLPs by 15.54
and outperforms state-of-the-art GKD algorithms on six real-world datasets. In
terms of inference speed, TGS infers 75X-89X faster than existing GNNs and
16X-25X faster than classical inference acceleration methods.
更多查看译文
关键词
Graph neural networks,graph knowledge distillation,inference acceleration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要