Improving system latency of AI accelerator with on-chip pipelined activation preprocessing and multi-mode batch inference

Wenxuan Chen,Zheng Wang,Ming Lei,Bo Dong,Zhuo Wang,Yongkui Yang,Chao Chen,Weiyu Guo,Chen Liang,Qian Zhang,Wenqi Fang,Zhibin Yu

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)（2021）

引用 2|浏览18

暂无评分

摘要

State-of-the-art neural network accelerators exploit massive computing parallelism to achieve high throughput. However, significant latency is observed on the master-slave-based AI acceleration system which limits its adaptation in real-time applications. Investigation in de-facto GPU system reveals tremendous timing overhead for preprocessing of input activations, which is commonly executed on th...

查看译文

关键词

Power demand,Pipelines,Data preprocessing,Random access memory,Prototypes,Throughput,Real-time systems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要