Equitable Top-k Results for Long Tail Data.

Proceedings of the ACM on Management of Data(2023)

引用 0|浏览0
暂无评分
摘要
For datasets exhibiting long tail phenomenon, we identify a fairness concern in existing top-k algorithms, that return a "fixed" set of k results for a given query. This causes a handful of popular records (products, items, etc) getting overexposed and always be returned to the user query, whereas, there exists a long tail of niche records that may be equally desirable (have similar utility). To alleviate this, we propose θ-Equiv-top-k-MMSP inside existing top-k algorithms - instead of returning a fixed top-k set, it generates all (or many) top-k sets that are equivalent in utility and creates a probability distribution over those sets. The end user will be returned one of these sets during the query time proportional to its associated probability, such that, after many draws from many end users, each record will have as equal exposure as possible (governed by uniform selection probability). θ-Equiv-top-k-MMSP is formalized with two sub-problems. (a) θ-Equiv-top-k-Sets to produce a set S of sets, each set has k records, where the sets are equivalent in utility with the top-k set; (b) MaxMinFair to produce a probability distribution over S, that is, PDF(S), such that the records in S have uniform selection probability. We formally study the hardness of θ-Equiv-top-k-MMSP. We present multiple algorithmic results - (a) An exact solution for θ-Equiv-top-k-Sets, and MaxMinFair. (b) We design highly scalable algorithms that solve θ-Equiv-top-k-Sets through a random walk and is backed by probability theory, as well as a greedy solution designed for MaxMinFair. (c) We finally present an adaptive random walk based algorithm that solves θ-Equiv-top-k-Sets and MaxMinFair at the same time. We empirically study how θ-Equiv-top-k-MMSP can alleviate a equitable exposure concerns that group fairness suffers from. We run extensive experiments using 6 datasets and design intuitive baseline algorithms that corroborate our theoretical analysis.
更多
查看译文
关键词
equal exposure,fairness,long tail data,optimization algorithm,top-k retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要