A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

2018 IEEE Symposium on VLSI Circuits(2018)

引用 116|浏览88
暂无评分
摘要
A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture and an on-chip scratchpad hierarchy. Compute precision is optimized at 16b floating point (fp 16) for high model accuracy in training and inference as well as 1b/2b (bi-nary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp 16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14nm CMOS.
更多
查看译文
关键词
AI trainina,multiTOPS AI core,deep learning training,edge devices,data centers,programmable architecture,custom ISA,neural network topologies,dataflow architecture,on-chip scratchpad hierarchy,aggressive inference performance,CMOS,binary integer,ternary integer,compute precision optimization,AI inference,scalable multiteraOPS deep learning processor core,TOPS binary peak performance,floating point,frequency 1.5 GHz,computer speed 1.5 TFLOPS,size 14.0 nm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要