Exploiting Frame Similarity for Efficient Inference on Edge Devices.

Ziyu Ying,Shulin Zhao,Haibo Zhang,Cyan Subhra Mishra,Sandeepa Bhuyan,Mahmut T. Kandemir,Anand Sivasubramaniam,Chita R. Das

IEEE International Conference on Distributed Computing Systems (ICDCS)（2022）

引用 1|浏览64

暂无评分

摘要

Deep neural networks (DNNs) are being widely used in various computer vision tasks as they can achieve very high accuracy. However, the large number of parameters employed in DNNs can result in long inference times for vision tasks, thus making it even more challenging to deploy them in the compute- and memory-constrained mobile/edge devices. To boost the inference of DNNs, some existing works employ compression (model pruning or quantization) or enhanced hardware. However, most prior works focus on improving model structure and implementing custom accelerators. As opposed to the prior work, in this paper, we target the video data that are processed by edge devices, and study the similarity between frames. Based on that, we propose two runtime approaches to boost the performance of the inference process, while achieving high accuracy. Specifically, considering the similarities between successive video frames, we propose a frame-level compute reuse algorithm based on the motion vectors of each frame. With frame-level reuse, we are able to skip 53% of frames in inference with negligible overhead and remain within less than 1% mAP (accuracy) drop for the object detection task. Additionally, we implement a partial inference scheme to enable region/tile-level reuse. Our experiments on a representative mobile device (Pixel 3 Phone) show that the proposed partial inference scheme achieves 2x speedup over the baseline approach that performs full inference on every frame. We integrate these two data reuse algorithms to accelerate the neural network inference and improve its energy efficiency. More specifically, for each frame in the video, we can dynamically select between (i) performing a full inference, (ii) performing a partial inference, or (iii) skipping the inference altogether. Our experimental evaluations using six different videos reveal that the proposed schemes are up to 80% (56% on average) energy efficient and 2.2 x performance efficient compared to the conventional scheme, which performs full inference, while losing less than 2% accuracy. Additionally, the experimental analysis indicates that our approach outperforms the state-of-the-art work with respect to accuracy and/or performance/energy savings.

查看译文

关键词

Mobile Computing,DNN Inference,Energy Efficient,Video Analysis,Motion Vector,Object Detection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要