Web User Interface as a Message - Power Law for Fraud Detection in Crowdsourced Labeling.

Sebastian Heil,Maxim Bakaev,Martin Gaedke

ICWE（2021）

引用 0|浏览9

暂无评分

摘要

Web Engineering becomes increasingly hungry for training data, as the application of machine learning (ML) methods in the field intensifies. Human-labeled datasets are particularly indispensable for ML-based validation and design of user interfaces (UIs). The production of such datasets is often outsourced to crowdworkers, who typically have lower motivation and payment compared to in-house staff, so the quality of their work becomes the paramount concern. In our paper, we explore the applicability of the trending fraud detection approach based on fit to power law in crowdsourced web UI labeling. On Amazon Mechanical Turk, 298 crowdworkers labeled over 30,000 UI elements in about 500 university homepage screenshots. We found a significant correlation between workers' precisions and Kolmogorov-Smirnov statistics-based goodness-of-fit between the frequencies of UI elements in a worker's output and power law. The obtained R-2 = 0.504 was higher than the R-2 = 0.432 baseline for the popular time-on-task parameter. Moreover, the distribution of UI elements' frequencies is much less prone to manipulation by malicious crowdworkers, which is advantageous as a crowdsourced data quality control measure. The findings of our study suggest a certain resemblance between web UIs and natural language texts, in which word frequencies are known to comply with Zipf's law.

查看译文

关键词

Data quality,Distribution functions,Crowdsourcing,Amazon Mechanical Turk,Image labeling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要