Improving the Strength of Human-Like Models in Chess

ICLR 2023(2023)

引用 0|浏览3
暂无评分
摘要
Designing AI systems that capture human-like behavior has attracted growing attention in applications where humans may want to learn from, or need to collaborate with, these AI systems. Many existing works in designing human-like AI have taken a supervised learning approach that learns from data of human behavior, with the goal of creating models that can accurately predict human behavior. While this approach has shown success in capturing human behavior at different skill levels and even identifying individual behavioral styles, it also suffers from the drawback of mimicking human mistakes. Moreover, existing models only capture a snapshot of human behavior, leaving the question of how to improve them---e.g., from one human skill level to a stronger one---largely unanswered. Using chess as an experimental domain, we investigate the question of teaching an existing human-like model to be stronger using a data-efficient curriculum, while maintaining the model's human similarity. To achieve this goal, we extend the concept of curriculum learning to settings with multiple labeling strategies, allowing us to vary both the curriculum (dataset) and the teacher (labeling strategy). We find that the choice of teacher has a strong impact on both playing strength and human similarity; for example, a teacher that is too strong can be less effective at improving playing strength and degrade human similarity more rapidly. We also find that the choice of curriculum can impact these metrics, but to a smaller extent; for example, training on a curriculum of human mistakes provides only a marginal benefit over training on a random curriculum. Finally, we show that our strengthened models achieve human similarity on datasets corresponding to their strengthened level of play, suggesting that our curriculum training methodology is improving them in human-like steps.
更多
查看译文
关键词
Human-like AI,Curriculum Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要