B2-Sampling: Fusing Balanced and Biased Sampling for Graph Contrastive Learning

Mengyue Liu,Yun Lin,Jun Liu,Bohao Liu,Qinghua Zheng,Jin Song Dong

KDD 2023（2023）

引用 1|浏览82

暂无评分

摘要

Graph contrastive learning (GCL), aiming for an embedding space where semantically similar nodes are closer, has been widely applied in graph-structured data. Researchers have proposed many approaches to define positive and negative pairs (i.e., semantically similar and dissimilar pairs) on the graph, serving as labels to learn their embedding distances. Despite the effectiveness, those approaches usually suffer from two typical learning challenges. First, the number of candidate negative pairs is enormous. Thus, it is non-trivial to select representative ones to train the model in a more effective way. Second, the heuristics (e.g., graph views or meta-path patterns) to define positive and negative pairs are sometimes less reliable, causing considerable noise for both "labelled" positive and negative pairs. In this work, we propose a novel sampling approach B-2-Sampling to address the above challenges in a unified way. On the one hand, we use balanced sampling to select the most representative negative pairs regarding both the topological and embedding diversities. On the other hand, we use biased sampling to learn and correct the labels of the most error-prone negative pairs during the training. The balanced and biased samplings can be applied iteratively for discriminating and correcting training pairs, boosting the performance of GCL models. B-2-Sampling is designed as a framework to support many known GCL models. Our extensive experiments on node classification, node clustering, and graph classification tasks show that B-2-Sampling significantly improves the performance of GCL models with acceptable run-time overhead. Our website [11] provides access to our codes and additional experiment results.

查看译文

关键词

graph contrastive learning,neural network,negative sampling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要