Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load

2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)(2021)

引用 4|浏览17
暂无评分
摘要
Popular parallel I/O libraries, such as HDF5, provide tuning parameters to obtain superior performance. However, the selection of effective parameters on production systems is complex due to the interdependence of I/O software and file system layers. Hence, application developers typically use the default parameters and often experience poor I/O performance. This work conducts a benchmarking-based analysis on the HDF5 behaviors with a wide variety of I/O patterns to extract performance characteristics under the production workload. To make the analysis well controlled, we exercise I/O benchmarks on POSIX-IO, MPI-IO, and HDF5 using the same I/O patterns and in the same jobs. To address high performance variability in production environments, we repeat the benchmarks across I/O patterns, storage devices, and time intervals. Based on the results, we identified consistent HDF5 behaviors that appropriate configurations and operations on dataset layout and file-metadata placement can improve performance significantly. We apply our findings and evaluate the tuned I/O library on two supercomputers: Summit and Cori. The results show that our tuned parameters can achieve more than 10× I/O performance speedup than that with default parameters on both systems, suggesting the effectiveness, stability, and generality of our solution.
更多
查看译文
关键词
production load,production systems,file system layers,benchmarking-based analysis,HDF5,production environments,performance characteristic extraction,parallel I/O libraries,I/O software,POSIX-IO,MPI-IO,dataset layout,file-metadata placement,Summit supercomputer,Cori supercomputer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要