谷歌浏览器插件
订阅小程序
在清言上使用

Optimizing TensorFlow Performance by Reconstructing the Convolution Routine

Transactions on smart processing and computing(2021)

引用 0|浏览0
暂无评分
摘要
Using deep learning, we can currently build computational models composed of multiple processing layers to learn representations of data. Convolutional neural networks (CNNs) have been widely adopted to achieve significant performance in image recognition and classification. TensorFlow, an open-source deep learning framework from Google, uses profiling to select one convolution algorithm, from among several available, as the core of a CNN to deliver the best performance in terms of execution time and memory usage. However, the overhead from profiling is considerably significant, because TensorFlow executes and profiles all the available algorithms for the best selection whenever an application is launched. We observe that memory usage overshoots during profiling, which limits data parallelism, and thus, fails to deliver maximum performance. In this paper, we present a novel profiling method to reduce overhead by storing the profile result from the first run and reusing it from the second run on. Using Inception-V3, we achieved up to 1.12 times and 1.11 times higher throughput, compared to the vanilla TensorFlow and TensorFlow with XLA JIT compilation, respectively, without losing accuracy.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要