Extending Contrastive Learning to Unsupervised Coreset Selection

Jeongwoo Ju,Heechul Jung,Yoonju Oh,Junmo Kim

IEEE ACCESS（2022）

引用 3|浏览11

暂无评分

摘要

Self-supervised contrastive learning offers a means of learning informative features from a pool of unlabeled data. In this paper, we investigate another useful approach. We propose an entirely unlabeled coreset selection method. In this regard, contrastive learning, one of several self-supervised methods, was recently proposed and has consistently delivered the highest performance. This prompted us to choose two leading methods for contrastive learning: the simple framework for contrastive learning of visual representations (SimCLR) and the momentum contrastive (MoCo) learning framework. We calculated the cosine similarities for each example of an epoch for the entire duration of the contrastive learning process and subsequently accumulated the cosine similarity values to obtain the coreset score. Our assumption was that a sample with low similarity would likely behave as a coreset. Compared with existing coreset selection methods with labels, our approach reduced the cost associated with human annotation. In this study, the unsupervised method implemented for coreset selection achieved improvements of 1.25% (for CIFAR10), 0.82% (for SVHN), and 0.19% (for QMNIST) over a randomly selected subset with a size of 30%. Furthermore, our results are comparable to those of the existing supervised coreset selection methods. The differences between the proposed and the above mentioned supervised coreset selection method (forgetting events) were 0.81% on the CIFAR10 dataset, -2.08% on the SVHN dataset (the proposed method outperformed the existing method), and 0.01% on the QMNIST dataset at a subset size of 30%. In addition, our proposed approach exhibited robustness even if the coreset selection model and target model were not identical (e.g., using ResNet18 as a selection model and ResNet101 as the target model). Lastly, we obtained more concrete proof that our coreset examples are highly informative by showing the performance gap between the coreset and non-coreset samples in the coreset cross test experiment. We observed a pair of performance ((testing: non-coreset, training: coreset), (testing: coreset, training: non-coreset)), i.e. (94.27%, 67.39 %) for CIFAR10, (98.24%, 83.30%) for SVHN, and (99.89%, 93.07%) for QMNIST with a subset size of 30%.

查看译文

关键词

Task analysis, Training, Measurement, Annotations, Licenses, Feature extraction, Deep learning, Coreset selection, image classification, self-supervised learning, contrastive learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要