Enhancing Model Parallelism in Neural Architecture Search for Multidevice System.

Cheng Fu,Huili Chen,Zhenheng Yang,Farinaz Koushanfar,Yuandong Tian,Jishen Zhao

IEEE Micro（2020）

引用 3|浏览92

暂无评分

摘要

Neural architecture search (NAS) finds favorable network topologies for better task performance. Existing hardware-aware NAS techniques only target to reduce inference latency on single CPU/GPU systems and the searched model can hardly be parallelized. To address this issue, we propose ColocNAS, the first synchronization-aware, end-to-end NAS framework that automates the design of parallelizable neural networks for multidevice systems while maintaining a high task accuracy. ColocNAS defines a new search space with elaborated connectivity to reduce device communication and synchronization. ColocNAS consists of three phases: 1) offline latency profiling that constructs a lookup table of inference latency of various networks for online runtime approximation; 2) differentiable latency-aware NAS that simultaneously minimizes inference latency and task error; and 3) reinforcement-learning-based device placement fine-tuning to further reduce the latency of the deployed model. Extensive evaluation corroborates ColocNAS's effectiveness to reduce inference latency while preserving task accuracy.

查看译文

关键词

Computer architecture,Microprocessors,Parallel processing,Computational modeling,Task analysis,Neural networks,Synchronization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要