Similarity flooding: a versatile graph matching algorithm and its application to schema matching

Sergey Melnik,Hector Garcia-Molina,Erhard Rahm

San Jose, CA（2002）

引用 2266|浏览433

暂无评分

摘要

Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the `accuracy' of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.

查看译文

关键词

data handling,data structures,data warehouses,pattern matching,accuracy metric,biochemical applications,data schemas,data warehousing,e-business,filters,fixpoint computation,graph matching algorithm,high-level operators,information models,mappings,schema matching,similarity flooding,user labor savings

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要