Clarify: Improving Model Robustness With Natural Language Corrections
CoRR(2024)
摘要
In supervised learning, models are trained to extract correlations from a
static dataset. This often leads to models that rely on high-level
misconceptions. To prevent such misconceptions, we must necessarily provide
additional information beyond the training data. Existing methods incorporate
forms of additional instance-level supervision, such as labels for spurious
features or additional labeled data from a balanced distribution. Such
strategies can become prohibitively costly for large-scale datasets since they
require additional annotation at a scale close to the original training data.
We hypothesize that targeted natural language feedback about a model's
misconceptions is a more efficient form of additional supervision. We introduce
Clarify, a novel interface and method for interactively correcting model
misconceptions. Through Clarify, users need only provide a short text
description to describe a model's consistent failure patterns. Then, in an
entirely automated way, we use such descriptions to improve the training
process by reweighting the training data or gathering additional targeted data.
Our user studies show that non-expert users can successfully describe model
misconceptions via Clarify, improving worst-group accuracy by an average of
17.1
novel hard subpopulations in the ImageNet dataset, improving minority-split
accuracy from 21.1
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要