谷歌浏览器插件
订阅小程序
在清言上使用

Enhancing Model Parallelism in Neural Architecture Search for Multidevice System.

IEEE Micro(2020)

引用 3|浏览92
暂无评分
摘要
Neural architecture search (NAS) finds favorable network topologies for better task performance. Existing hardware-aware NAS techniques only target to reduce inference latency on single CPU/GPU systems and the searched model can hardly be parallelized. To address this issue, we propose ColocNAS, the first synchronization-aware, end-to-end NAS framework that automates the design of parallelizable neural networks for multidevice systems while maintaining a high task accuracy. ColocNAS defines a new search space with elaborated connectivity to reduce device communication and synchronization. ColocNAS consists of three phases: 1) offline latency profiling that constructs a lookup table of inference latency of various networks for online runtime approximation; 2) differentiable latency-aware NAS that simultaneously minimizes inference latency and task error; and 3) reinforcement-learning-based device placement fine-tuning to further reduce the latency of the deployed model. Extensive evaluation corroborates ColocNAS's effectiveness to reduce inference latency while preserving task accuracy.
更多
查看译文
关键词
Computer architecture,Microprocessors,Parallel processing,Computational modeling,Task analysis,Neural networks,Synchronization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要