Leon: A Distributed Rdf Engine For Multi-Query Processing
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I(2019)
摘要
As similar queries keep springing up in real query logs, few RDF systems address this problem. In this paper, we propose Leon, a distributed RDF system, which can also deal with multi-query problem. First, we apply a characteristic-set-based partitioning scheme. This scheme (i) supports the fully parallel processing of join within characteristic sets; (ii) minimizes data communication by applying direct transmission of intermediate results instead of broadcasting. Then, Leon revisits the classical problem of multi-query optimization in the context of RDF/SPARQL. In light of the NP-hardness of the multi-query optimization for SPARQL, we propose a heuristic algorithm that partitions the input batch of queries into groups, and discover the common sub-query of multiple SPARQL queries. Our MQO algorithm incorporates with a subtle cost model to generate execution plans.Our experiments with synthetic and real datasets verify that: (i) Leon's startup overhead is low; (ii) Leon consistently outperforms centralized RDF engines by 1-2 orders of magnitude, and it is competitive with state-of-the-art distributed RDF engines; (iii) Our MQO approach consistently demonstrates 10x speedup over the baseline method.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络