ADDRESSING BOTTLENECKS FOR DEEP NEURAL NETWORK EXECUTION OF A GRAPHICS PROCESSOR UNIT

user-5d8054e8530c708f9920ccce(2020)

引用 0|浏览29
暂无评分
摘要
A method includes receiving a non-optimized deep neural network (DNN), identifying sets of contributing and/or non-contributing synapse vectors, and generating an optimized DNN based on the non-optimized DNN. A method includes loading two strings into a first register, loading contents of the first register into an on-chip register, loading a first set of bits of the on-chip register into a second register, loading a second set of bits of the on-chip register into a third register, computing on the second register and third register, and writing the contents of the second register and third register to off-chip memory. A method includes extending a parallel thread instruction set architecture of a processor. A processor includes a plurality of floating point units including a data fission unit and an instruction unit. A runtime system includes an off-chip memory, registers, and on-chip memory. The runtime system includes a synapse vector elimination kernel.
更多
查看译文
关键词
Instruction unit,Instruction set,Runtime system,Thread (computing),Floating point,Artificial neural network,Kernel (linear algebra),Parallel computing,Computer science,Graphical processing unit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要