Dynamic repair of categorical data with edit rules

Expert Systems with Applications(2022)

引用 2|浏览11
暂无评分
摘要
In this paper, a dynamic setting for data quality improvement is studied. In such a setting, there is a repeated search for data quality rules and a fix of their violations until stability is reached. The constraints considered here are simple constant edit rules and searching is done via association analysis. Repair of violations relies on the set cover method. This paper contributes to the field of data quality in three ways. First, it is shown that with appropriate filtering, association analysis is an appealing tool to discover data quality rules with high precision. Second, when edit rules are limited to logical implications such as association rules, then under reasonable circumstances, time complexity of rule implication reduces from exponential to quadratic. This result is formalized as the strong generator theorem. Third, a detailed analysis of data repair in a dynamic setting is provided and the conditions for termination are shown. Empirical results indicate that if the initial precision of rules is high, then repeated search-and-repair offers a boost in recall with a mitigated drop in precision.
更多
查看译文
关键词
Data quality,Data repair,Edit rules
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要