CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware
arxiv(2024)
摘要
With the recent growth in demand for large-scale deep neural networks,
compute in-memory (CiM) has come up as a prominent solution to alleviate
bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman
architectures. However, the construction of CiM hardware poses a challenge as
any specific memory hierarchy in terms of cache sizes and memory bandwidth at
different interfaces may not be ideally matched to any neural network's
attributes such as tensor dimension and arithmetic intensity, thus leading to
suboptimal and under-performing systems. Despite the success of neural
architecture search (NAS) techniques in yielding efficient sub-networks for a
given hardware metric budget (e.g., DNN execution time or latency), it assumes
the hardware configuration to be frozen, often yielding sub-optimal
sub-networks for a given budget. In this paper, we present CiMNet, a framework
that jointly searches for optimal sub-networks and hardware configurations for
CiM architectures creating a Pareto optimal frontier of downstream task
accuracy and execution metrics (e.g., latency). The proposed framework can
comprehend the complex interplay between a sub-network's performance and the
CiM hardware configuration choices including bandwidth, processing element
size, and memory size. Exhaustive experiments on different model architectures
from both CNN and Transformer families demonstrate the efficacy of the CiMNet
in finding co-optimized sub-networks and CiM hardware configurations.
Specifically, for similar ImageNet classification accuracy as baseline ViT-B,
optimizing only the model architecture increases performance (or reduces
workload execution time) by 1.7x while optimizing for both the model
architecture and hardware configuration increases it by 3.1x.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要