Name Similarity For Composite Element Name Matching

Naveen Ashish, Arihant Patawari,Simrat Singh Chhabra,Arthur W. Toga

BCB（2016）

引用 0|浏览28

暂无评分

摘要

Background and Objective: Matching corresponding data elements is a critical problem in biomedical data harmonization for data sharing. The similarity of the element names is one of the many factors employed in determining data element matches. Determining name similarity is complicated by the fact that data element names in biomedical data are composite i.e., composed of multiple components. We provide a better approach to determining element name similarity for composite element names.Methods: Our solution is based on decomposing composite element names into constituent components and then determining the name similarity by comparing corresponding components across element names. We use a machine-learning based classification approach to the problem, building upon a field-byfield matching model from record-linkage techniques.Results: The element name similarity achieved by our approach is significantly superior to existing string matching techniques. The element name similarity metric consequently improves the matching accuracy of element matching systems overallConclusions: Our approach is effective and has been integrated as part of a more comprehensive "schema-mapping" system, which we have developed for harmonizing biomedical datasets.

查看译文

关键词

Element name matching,String similarity,Bio-medical data integration,Record-linkage,Machine-learning classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要