Refining Large Integrated Identity Graphs Using the Unique Name Assumption.

Shuai Wang,Joe Raad,Peter Bloem,Frank van Harmelen

ESWC（2023）

引用 0|浏览18

暂无评分

摘要

The Unique Name Assumption (UNA) supposes that two terms with distinct identifiers from the same knowledge base do not refer to the same real-world entity. The UNA can be used to detect errors in large integrated knowledge bases. For example, some identity link can be erroneous if they are in a path that connects two entities (that refer to different real-world objects) defined in the same knowledge base. For large knowledge bases, however, the UNA does not always hold due to redundant IRIs that capture various encodings, languages, namespaces, versions, letter cases, etc. The UNA can still be useful for identifying erroneous links provided good adaption to the exceptions. For this, we propose a concrete definition of the UNA with tolerance towards multiple exceptions, namely the internal UNA (iUNA). To compare the iUNA and other variants of the UNA, we propose a generic algorithm that can be used for refinement. The algorithm employs an SMT (Satisfiability Modulo Theory) solver and takes advantage of the latter’s ability to efficiently reason over equality. For evaluation, we identify erroneous links in an identity graph of half a billion triples extracted from the LOD Cloud, and compare our approach against community detection methods (Louvain and Leiden) as well as other identity refinement approaches.

查看译文

关键词

large integrated identity graphs

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要