System-level hardware failure prediction using deep learning

Proceedings of the 56th Annual Design Automation Conference 2019(2019)

引用 65|浏览85
暂无评分
摘要
Disk and memory faults are the leading causes of server breakdown. A proactive solution is to predict such hardware failure at the runtime and then isolate the hardware at risk and backup the data. However, the current model-based predictors are incapable of using the discrete time-series data, such as the values of device attributes, which conveys high-level information of the device behavior. In this paper, we propose a novel deep-learning based prediction scheme for system-level hardware failure prediction. We normalize the distribution of samples' attributes from different vendors to make use of diverse training sets. We propose a temporal Convolution Neural Network based model that is insensitive to the noise in the time dimension. Finally, we design a loss function to train the model with extremely imbalanced samples effectively. Experimental results from an open S.M.A.R.T data set and an industrial data set show the effectiveness of the proposed scheme.
更多
查看译文
关键词
Hardware failure, temporal CNN, transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要