Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models
CoRR(2024)
摘要
The growing integration of large language models (LLMs) into social
operations amplifies their impact on decisions in crucial areas such as
economics, law, education, and healthcare, raising public concerns about these
models' discrimination-related safety and reliability. However, prior
discrimination measuring frameworks solely assess the average discriminatory
behavior of LLMs, often proving inadequate due to the overlook of an additional
discrimination-leading factor, i.e., the LLMs' prediction variation across
diverse contexts. In this work, we present the Prejudice-Caprice Framework
(PCF) that comprehensively measures discrimination in LLMs by considering both
their consistently biased preference and preference variation across diverse
contexts. Specifically, we mathematically dissect the aggregated contextualized
discrimination risk of LLMs into prejudice risk, originating from LLMs'
persistent prejudice, and caprice risk, stemming from their generation
inconsistency. In addition, we utilize a data-mining approach to gather
preference-detecting probes from sentence skeletons, devoid of attribute
indications, to approximate LLMs' applied contexts. While initially intended
for assessing discrimination in LLMs, our proposed PCF facilitates the
comprehensive and flexible measurement of any inductive biases, including
knowledge alongside prejudice, across various modality models. We apply our
discrimination-measuring framework to 12 common LLMs, yielding intriguing
findings: i) modern LLMs demonstrate significant pro-male stereotypes, ii)
LLMs' exhibited discrimination correlates with several social and economic
factors, iii) prejudice risk dominates the overall discrimination risk and
follows a normal distribution, and iv) caprice risk contributes minimally to
the overall risk but follows a fat-tailed distribution, suggesting that it is
wild risk requiring enhanced surveillance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要