Guaranteeing That Multilevel Prioritized DNN Models on an Embedded GPU Have Inference Performance Proportional to Respective Priorities

IEEE Embedded Systems Letters(2022)

引用 4|浏览1
暂无评分
摘要
When multiple deep neural networks (DNNs) are using an embedded GPU as an accelerator, adjusting the CPU occupancy time of the process encompassing each DNN on a priority basis does not always guarantee that higher priority DNNs take priority over the GPU. To address this problem, we propose a methodology that basically uses the model from PyTorch without modification while providing additional advantages. First, the response performance of higher priority DNNs is improved by allowing DNNs to occupy the GPU in preference in proportion to the priority granted to the hosting processes. Second, it reduces the execution time of each DNN by removing the multicontext overhead that occurs under an environment with an independent process per DNN, executing DNNs in a multithreaded manner to overcome the limitation of pure Python and taking advantage of multistream for concurrent running of different DNN operations inside the GPU.
更多
查看译文
关键词
Deep neural network (DNN) scheduling,embedded GPU,multi-DNN,priority
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要