Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
CoRR(2024)
摘要
Resistive Random Access Memory (ReRAM) has emerged as a promising platform
for deep neural networks (DNNs) due to its support for parallel in-situ
matrix-vector multiplication. However, hardware failures, such as
stuck-at-fault defects, can result in significant prediction errors during
model inference. While additional crossbars can be used to address these
failures, they come with storage overhead and are not efficient in terms of
space, energy, and cost. In this paper, we propose a fault protection mechanism
that incurs zero space cost. Our approach includes: 1) differentiable structure
pruning of rows and columns to reduce model redundancy, 2) weight duplication
and voting for robust output, and 3) embedding duplicated most significant bits
(MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE
benchmark with the BERT model, and experimental results prove its
effectiveness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要