An Energy-Efficient and Area-Efficient Depthwise Separable Convolution Accelerator with Minimal On-Chip Memory Access

Yi Chen,Jie Lou, Christian Lanius,Florian Freye, Johnson Loh,Tobias Gemmeke

2023 IFIP/IEEE 31ST INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, VLSI-SOC(2023)

引用 0|浏览0
暂无评分
摘要
Depthwise separable convolution (DSC) has emerged as a crucial building block for developing lightweight convolutional neural networks (CNNs). In this paper, we present a hardware accelerator for DSC that enables 100% utilization of the processing element (PE) array for depthwise convolution (DWC) and achieves up to 98% utilization for pointwise convolution (PWC), while also reducing latency. By partitioning the input feature map (ifmap) SRAM of the DWC into three banks, we minimize memory access and maximize data reuse. The input activations and weights only need to be loaded once from SRAM to PE for both DWC and PWC. Additionally, to support efficient operations across different layers, we present a layerwise matching method. The proposed DSC accelerator is implemented in 22nm FDSOI technology and validated using MobileNetV1 on the CIFAR10 dataset. The post-layout results demonstrate that the proposed accelerator can operate at 1GHz and achieve an energy efficiency of 5.07 (3.96) TOPS/W and an area efficiency of 519.2 (461.52) GOPS/mm(2) for DWC (PWC) at 0.8V. After scaling the supply voltage down to 0.5V, the energy efficiency for the proposed accelerator increases to 13.64 TOPS/W for DWC and 10.64 TOPS/W for PWC, respectively.
更多
查看译文
关键词
Depthwise separable convolution,hardware accelerator,PE utilization,energy-efficient design,area-efficient design,memory access
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要