TWIX: Approximate and Exact Twig Structure and Content Matching over XML Document Collections using Binary Labeling

msra(2005)

引用 21|浏览6
暂无评分
摘要
XML queries specify predicates on the content and the structure of the elements of tree-structured XML docu- ments. Hence, discovering the occurrences of twig (tree structure) query patterns is a core operation for XML query processing. Prior works have typically applied top-down decomposition of the twig patterns into (i) binary (parent-child or ancestor-descendant) relation- ships, or (ii) path expression queries, followed by a join operation to reconstruct matched twig patterns. How- ever, most of these methods (i) rely on the user's knowl- edge of the underlying database to pose well-formed queries, and (ii) suffer from inspecting too many irrel- evant results. In this paper, we propose a novel heuris- tic for matching of XML twig query patterns, named TWIX, which imposed minimal restrictions on the user and causes substantial reduction of the search space through a distributed binary labeling technique. The al- gorithm incorporates a holistic ranking scheme of struc- ture and content, named TRANK, to rank and report the top-k results. Furthermore, TWIX benefits from an in- teractive graphical user interface twig query matching. Experimental results on real datasets depict the ranking semantics and efficient filtration of the search space.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要