Improving the Performance of Processing for Small Files in Hadoop : A Case Study of Weather Data Analytics

semanticscholar（2014）

引用 1|浏览0

暂无评分

摘要

-Hadoop is an open source Apache project that supports master slave architecture, which involves one master node and thousands of slave nodes. Master node acts as the name node, which stores all the metadata of files and slave nodes acts as the data nodes, which stores all the application data. Hadoop is designed to process large data sets (petabytes). It becomes a bottleneck, when handling massive small files because the name node utilize more memory to store the metadata of files and the data nodes consumes more CPU time to process massive small files. In this paper, the author proposes the Optimized Hadoop, consists of Merge Model to merge massive small files into a single large file and introduced the efficient indexing mechanism. Our experimental result shows that Optimized Hadoop improves performance of processing small files drastically up to 90.83% and effectively reduces the memory utilization of the name node to store the metadata of files. Keywords—Hadoop; Hadoop Distributed File System,; Map Reduce; Small Files.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要