Reproducibility and Performance of Deep Learning Applications for Cancer Detection in Pathological Images

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)(2019)

引用 4|浏览3
暂无评分
摘要
Convolutional Neural Networks (CNN) are used for automatic cancer detection in pathological images. These data-driven experiments are difficult to reproduce, because the CNNs may require CUDA-enabled Nvidia GPUs for acceleration and training is often performed on a large dataset stored on a researcher's computer, inaccessible to others. We introduce the RED file format for reproducible experiment description, where executable programs are packaged and referenced as Docker container images. Data inputs and outputs are described as network resources using standard transmission and authentication protocols instead of local file paths. Following the FAIR guiding principles, the RED format is based on and compatible with the established Common Workflow Language specification. RED files are interpreted by the accompanying Curious Containers (CC) software. Arbitrarily large datasets are mounted inside containers via FUSE network filesystems like SSHFS. SSHFS is compared to NFS and a local SSD in artificial benchmarks and in the context of a CNN training scenario, where SSHFS introduces a performance decrease by a factor of 1.8. We are convinced that RED can greatly improve the reproducibility of deep learning workloads and data-driven experiments. This is in particular important in clinical scenarios where the result of an analysis may contribute to a patient's treatment.
更多
查看译文
关键词
Reproducibility,Performance,Deep Learning,Machine Learning,Container,Docker,Filesystem in Userspace,CUDA,FAIR Principles,Common Workflow Language,Reproducible Experiment Descriptions,Curious Containers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要