P<inline-formula> <tex-math notation="LaTeX">${3}$</tex-math> </inline-formula>ViT: A CIM-Based High-Utilization Architecture With Dynamic Pruning and Two-Way Ping-Pong Macro for Vision Transformer

Xiangqu Fu,Qirui Ren,Hao Wu,Feibin Xiang,Qing Luo,Jinshan Yue,Yong Chen,Feng Zhang

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS（2023）

引用 0|浏览4

暂无评分

摘要

Transformers have made remarkable contributions to natural language processing (NLP) and many other fields. Recently, transformer-based models have achieved state-of-the-art (SOTA) performance on computer vision tasks compared with traditional convolutional neural networks (CNNs). Unfortunately, existing CNN accelerators cannot efficiently support transformer due to the high computational overhead and redundant data accesses associated with the 'KQV' matrix operations in the transformer models. If the recently-developed NLP transformer accelerators are applied to the vision transformer (ViT) models, their efficiency would decrease due to three challenges. 1) Redundant data storage and access still exist in ViT data flow scheduling. 2) For matrix transposition in transformer models, the previous transpose-operation schemes lack flexibility, resulting in extra area overhead. 3) The sparse acceleration schemes for NLP in prior transformer accelerators cannot efficiently accelerate ViT with relatively fewer tokens. To overcome these challenges, we propose P-3 ViT, a computing-in-memory (CIM)-based architecture, to efficiently accelerate ViT, achieving high utilization on data flow scheduling. There are three key contributions: 1) P-3 ViT architecture supports three ping-pong pipeline scheduling modes, involving inter-core parallel and intra-core ping-pong pipeline mode (IEP-IAP(3)), inter-core pipeline and parallel mode (IEP2), and full parallel mode, to eliminate redundant memory accesses. 2) A two-way ping-pong CIM macro is proposed, which can be configured to regular calculation mode and transpose calculation mode to adapt to both Q x K-T and A x V tasks. 3) P-3 ViT also runs a small prediction network. It prunes redundant tokens to be a standard number hierarchically and dynamically, enabling high-throughput and high-utilization attention computation. Measurements show that P-3 ViT achieves 1.13 x higher energy efficiency than the state-of-the-art transformer accelerator and achieves 30.8 x and 14.6 x speedup compared to CPU and GPU.

查看译文

关键词

Transformers, Heuristic algorithms, Task analysis, Computational modeling, Pipelines, Natural language processing, Common Information Model (computing), Vision transformer (ViT), computing-in-memory (CIM), accelerator, dynamic prune, prediction network, CMOS

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要