Cost-based analysis of the impact of data completeness and representational consistency

Decision Support Systems(2023)

引用 0|浏览9
暂无评分
摘要
Data quality is an important topic for businesses and therefore requires appropriate analysis tools. Although several rule-based systems exist today for quality measurement, their results do not always reflect the real impact of quality issues on practical data usability and are therefore not well-suited to base economic decisions on. This work practically implements and evaluates an alternative, cost-based approach for data quality analysis starting from a 'fitness for use'-perspective. The practical impact of completeness and representational consistency of data stored in an integrated relational database is investigated in an experiment with 218 volunteers. Two alternative versions of this database are then prepared by manually improving their data quality. Participants are randomly assigned to one of three databases and are given a set of questions to resolve by means of SQL. As questions are resolved, we measure several cost-based indicators such as ability to solve, time to solve and number of attempts. Results indicate that the impact of data quality issues can differ significantly from what would be expected when using rule-based measurement. Effects range from almost no impact to a 65% reduction in time needed to solve tasks. Effect sizes up to 0.43 using one-way ANCOVA tests are observed.
更多
查看译文
关键词
Cost-based analysis,Data quality,Completeness,Consistency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要