Crowdsourcing Practice for Efficient Data Labeling: Aggregation, Incremental Relabeling, and Pricing

Alexey Drutsa,Dmitry Ustalov,Evfrosiniya Zerminova,Valentina Fedorova,Olga Megorskaya,Daria Baidakova

SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020（2020）

引用 19|浏览42

暂无评分

摘要

In this tutorial, we present a portion of unique industry experience in efficient data labeling via crowdsourcing shared by both leading researchers and engineers from Yandex. We will make an introduction to data labeling via public crowdsourcing marketplaces and will present the key components of efficient label collection. This will be followed by a practice session, where participants will choose one of the real label collection tasks, experiment with selecting settings for the labeling process, and launch their label collection project on one of the largest crowdsourcing marketplaces. The projects will be run on real crowds within the tutorial session. While the crowd performers are annotating the project set up by the attendees, we will present the major theoretical results in efficient aggregation, incremental relabeling, and dynamic pricing. We will also discuss their strengths and weaknesses as well as applicability to real-world tasks, summarizing our five year-long research and industrial expertise in crowdsourcing. Finally, participants will receive a feedback about their projects and practical advice on how to make them more efficient. We invite beginners, advanced specialists, and researchers to learn how to collect high quality labeled data and do it efficiently.

查看译文

关键词

crowdsourcing, data annotation, quality control, task design, answer aggregation, incremental relabeling, dynamic pricing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要