Effective Bayesian-network-based missing value imputation enhanced by crowdsourcing

Knowledge-Based Systems(2020)

引用 18|浏览43
暂无评分
摘要
During the process of data collection, incompleteness is one of the most serious data quality problems to deal with. Traditional imputation methods mostly rely on statistics and machine learning techniques. However, both types of methods are limited in their accuracy due to lacking enough information about the missing data. To obtain more information, recent methods resort to external sources such as knowledge bases or the worldwide web. Unfortunately, such methods may still be less helpful, since there may exist little information about the missing values in the knowledge bases, or too much noise on the web. To tackle these issues, this paper adopts crowdsourcing as the external source, where hundreds of thousands of ordinary workers on the platform can provide high-quality information based on contextual knowledge and human cognitive ability. To reduce the cost, a joint model is proposed for imputation, which integrates crowdsourcing into the process of Bayesian inference. We first construct a Bayesian network for the attributes in the dataset, then the missing attribute values are inferred by Bayesian inference. To improve the accuracy of the Bayesian inference, we outsource a small number of informative tasks to the crowd workers, where the informative tasks are selected based on uncertainty and influence. The proposed approach is evaluated with extensive experiments using real-world datasets with a simulated crowd and two real crowdsourcing platforms. The experimental results show that our approach achieves a better performance compared to other imputation approaches.
更多
查看译文
关键词
Missing values,Bayesian network,Crowdsourcing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要