Distributed RDF Archives Querying with Spark.

Lecture Notes in Computer Science(2018)

引用 1|浏览24
暂无评分
摘要
The prevalence of open data and the expansion of published information on the web have engendered a large scale of available RDF data. When dealing with the evolution of the published datasets, users may need to access to not only the actual version of a dataset but equally the previous ones and would like to track the evolution of data over time. To this direction, single-machine RDF archiving systems and Benchmarks have been proposed but do not scale well to query large RDF archives. Distributed data management systems present a promising direction for providing scalability and parallel processing of large volume of RDF data. In this paper, we study and compare commonly used RDF archiving techniques and querying strategies with the distributed computing platform Spark. We propose a formal mapping of versioning queries defined with SPARQL into SQL SPARK. We make a series of experimentation of these queries to study the effects of RDF archives partitioning and distribution.
更多
查看译文
关键词
RDF archives,Distributed systems,Versioning queries,SPARQL,SPARK,SPARK SQL
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要