Pearls Hide Behind Linearity: Simplifying Deep Convolutional Networks for Embedded Hardware Systems via Linearity Grafting

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)(2024)

引用 0|浏览6
暂无评分
摘要
The increasing complexity of convolutional neural networks (CNNs) has fueled a huge demand for compression. Nonetheless, network pruning, as the most effective knob, fails to deliver Pareto-optimal networks. To tackle this issue, we introduce a novel pruning-free compression framework dubbed Domino, pioneering to revisit the trade-off dilemma between accuracy and efficiency from a fresh perspective of linearity and non-linearity. Specifically, Domino leverages two predictors, including one vanilla latency predictor and one meta-accuracy predictor, to identify the less important non-linear building blocks, which are then grafted with the linear counterparts. And next, the grafted network is trained on target task to obtain decent accuracy, after which the grafted linear building block that contains multiple consecutive linear layers is reparameterized into one single linear layer to boost the efficiency on target hardware without degrading the accuracy on target task. Extensive experiments on two popular Nvidia Jetson embedded platforms (i.e., Xavier and Nano) and two representative networks (i.e., MobileNetV2 and ResNet50) clearly demonstrate the superiority of Domino. For example, Domino-Aggressive achieves +10.6%/+8.8% higher top-l/top-5 accuracy on ImageNet than ${\mathrm {MobileNetV}} 2 \times 0.2$, while bringing $\times 1.9/\times 1.3$ speedup on Xavier/Nano.
更多
查看译文
关键词
Convolutional Network,Complex Network,Convolutional Neural Network,Building Blocks,Single Layer,ImageNet,Linear Layer,Target Task,Consecutive Layers,Linear Counterparts,Linear Block,Network Pruning,Learning Rate,Convolutional Layers,Computational Resources,Data Augmentation,Active Layer,Multilayer Perceptron,Single GPU,Pruning Method,Single Convolutional Layer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要