Adaptive Indexing for Distributed Array Processing

BigData Congress(2014)

引用 5|浏览45
暂无评分
摘要
Scientists are facing the data deluge in the scientific explorations. Big data are collected by the scientific instruments and experiments. The data are usually multidimensional arrays and stored in many files. Distributed computing techniques such as MapReduce make exploring the large datasets practical. The index is a well-known measure to shorten the query processing duration. Most of existing indexing methods need a full load of the raw data to build the index. In this paper, we proposed a distributed adaptive indexing method for the distributed array-oriented query processing. Our method does not require a full scan of the array data. For each subarray accessed by a subtask, we divide the array into multiple logical blocks with a proper block size. The normal processing routine is executed when handling a query. Meanwhile, the index for the blocks accessed by the query is built at a low cost. So the whole index grows along with processing queries. This incremental manner exploits the accessed data of historical queries and eliminates the long load procedure. The experiments show that our adaptive indexing implemented over Hadoop and Hive is effective for accelerating array-oriented query processing without introducing much overhead.
更多
查看译文
关键词
multidimensional array,scientific experiments,big data,distributed array-oriented query processing,mapreduce,multidimensional array, big data, mapreduce, indexing,scientific information systems,indexing,data deluge,database indexing,long load procedure,hadoop,scientific explorations,hive,scientific instruments,array-oriented query processing,query handling,distributed computing techniques,distributed array processing,distributed processing,multidimensional arrays,distributed adaptive indexing method,query processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要