Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation
CoRR(2024)
摘要
Data annotation remains the sine qua non of machine learning and AI. Recent
empirical work on data annotation has begun to highlight the importance of
rater diversity for fairness, model performance, and new lines of research have
begun to examine the working conditions for data annotation workers, the
impacts and role of annotator subjectivity on labels, and the potential
psychological harms from aspects of annotation work. This paper outlines a
critical genealogy of data annotation; starting with its psychological and
perceptual aspects. We draw on similarities with critiques of the rise of
computerized lab-based psychological experiments in the 1970's which question
whether these experiments permit the generalization of results beyond the
laboratory settings within which these results are typically obtained. Do data
annotations permit the generalization of results beyond the settings, or
locations, in which they were obtained? Psychology is overly reliant on
participants from Western, Educated, Industrialized, Rich, and Democratic
societies (WEIRD). Many of the people who work as data annotation platform
workers, however, are not from WEIRD countries; most data annotation workers
are based in Global South countries. Social categorizations and classifications
from WEIRD countries are imposed on non-WEIRD annotators through instructions
and tasks, and through them, on data, which is then used to train or evaluate
AI models in WEIRD countries. We synthesize evidence from several recent lines
of research and argue that data annotation is a form of automated social
categorization that risks entrenching outdated and static social categories
that are in reality dynamic and changing. We propose a framework for
understanding the interplay of the global social conditions of data annotation
with the subjective phenomenological experience of data annotation work.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要