The Advantages of Multiple Classes for Reducing Overfitting from Test Set Reuse

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97(2019)

引用 33|浏览107
暂无评分
摘要
Excessive reuse of holdout data can lead to over-fitting. Known results show that, in the worst-case, given the accuracies of k adaptively chosen classifiers on a dataset of size n, one can create a classifier with a bias of Theta(root k/n) for any binary prediction problem. We show a new upper bound of (O) over tilde (max{root k log(n)/(mn), k/n}) on the worst-case bias that any attack can achieve in a prediction problem with m classes. Moreover, we present an efficient attack that achieves a bias of Omega(root k/(m(2)n)) and improves on previous work for the binary setting (m = 2). We also present an inefficient attack that achieves a bias of (Omega) over tilde (k/n). Complementing our theoretical work, we give new practical attacks to stress-test multi-class benchmarks by aiming to create as large a bias as possible with a given number of queries. Our experiments show that the additional uncertainty of prediction with a large number of classes indeed mitigates the effect of our best attacks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要