A novel approach to assess and improve syntactic interoperability in data integration

Inf. Process. Manag.(2023)

引用 0|浏览6
暂无评分
摘要
Data integration is essential to enrich a database with external information. One effective approach is to match shared identifiers across diverse databases. However, a lack of syntactic interoperability, which refers to the ability to match data based on their syntax, can pose challenges. In this paper, we present a novel method to evaluate and enhance syntactic interop-erability, considering associated costs. First, we introduce the linking index and completeness index as generic measures of fine-grained syntactic interoperability. Second, we analyze the data consistency level of the identifiers using a rule-based framework for data quality assessment. Third, we propose a data integration strategy that strikes a balance between fixing data inconsistencies and the resulting benefits, as measured by the linking and completeness indices. The approach is illustrated through two use cases: bibliographic databases and clinical trial registries. The results demonstrate that standardizing identifiers' representations can signifi-cantly improve syntactic interoperability in certain scenarios while in others, the standardization process does not yield improvements, discouraging, hence integration decisions. By conducting a cost-benefit analysis of improving data interoperability, this analysis enables data integrators to make informed decisions regarding the feasibility and advantages of proceeding with data integration.
更多
查看译文
关键词
Relational databases,Interoperability,Data quality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要