Scalable Parallel Scientific Computing Using Twister 4 Azure

semanticscholar(2012)

引用 0|浏览0
暂无评分
摘要
Recent advances in data intensive computing for science discovery are fueling a dramatic growth in use of data-intensive iterative computations. The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure and storage services offers a very attractive environment for scientists to perform data analytics. The challenges to large-scale distributed computations demand new frameworks that are specifically tailored for cloud characteristics in order to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. It extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a wide array of data mining and data analysis applications on the Azure cloud. This paper discusses the applicability of Twister4Azure for scientific computation with highlighted features of fault-tolerance, efficiency and simplicity. We study four data-intensive applications − two iterative scientific applications, Multi-Dimensional Scaling and KMeans Clustering; two data– intensive pleasingly parallel scientific applications, BLAST+ sequence searching and SmithWaterman sequence alignment. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks. We also study and present solutions to several factors that affect the performance of iterative MapReduce appications on Windows Azure Cloud. KeywordsIterative MapReduce, Cloud Computing, HPC, Scientific applications, Azure
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要