Optimized Co-Scheduling of Mixed-Precision Neural Network Accelerator for Real-Time Multitasking Applications

Wei Jiang,Ziwei Song,Jinyu Zhan,Zhiyuan He,Xiangyu Wen,Ke Jiang

Journal of Systems Architecture (JSA)（2020）CCF BSCI 2区

Univ Elect Sci & Technol China | Veoneer Sweden AB

Cited 41|Views117

Abstract

Neural networks are increasingly applied into real-time and embedded Artificial Intelligent (AI) systems like autonomous driving system. Such resource-constrained systems cannot support the execution of neural network based tasks due to their high execution overheads on general processors. Hence, we are approaching to design real-time AI applications on embedded systems with CPU and FPGA (Field Programmable Gate Array) coprocessors. We use dedicated FPGA to accelerate the neural network job and utilize CPU to process the rest jobs of real-time multitasking applications. We devise an Idle-Aware Earliest Deadline First policy to co-schedule the AI applications on hybrid CPU and FPGA coprocessors. Since the implementation of neural network job on FPGA accelerator with different precision configuration will result in different execution time and accuracy, we are also interested in the design optimization of real-time AI applications running on mixed-precision neural network accelerator, with the purpose of maximizing the accuracy related rewards of all applications subject to real-time related constraints. We address the problem as a multi-stage decision procedure, and propose an efficient dynamic programming approach with two pruning policies to reduce the intermediate searching states. Extensive experiments and real-life case evaluations demonstrate the efficiency of the proposed approaches.

Translated text

Key words

Neural network accelerator,Mixed-precision,Real-time multitasking application,Co-scheduling,Design optimization

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Try using models to generate summary,it takes about 60s

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper