Genetic Algorithm-based Framework for Layer-Fused Scheduling of Multiple DNNs on Multi-core Systems

Sebastian Karl,Arne Symons,Nael Fasfous,Marian Verhelst

2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE（2023）

引用 0|浏览14

暂无评分

摘要

Heterogeneous multi-core architectures are becoming a popular design choice to accelerate the inference of modern deep neural networks (DNNs). This trend allows for more flexible mappings onto the cores, but shifts the challenge to keeping all cores busy due to limited network parallelism. To this extent, layer-fused processing, where several layers are mapped simultaneously to an architecture and executed in a depth-first fashion, has shown promising opportunities to maximize core utilization. However, SotA mapping frameworks fail to efficiently map layer-fused DNNs onto heterogeneous multi-core architectures due to ignoring 1.) on-chip weight traffic and 2.) inter-core communication congestion. This work tackles these shortcomings by introducing a weight memory manager (WMM), which manages the weights present in a core and models the cost of re-fetching weights. Secondly, the inter-core communication (ICC) of feature data is modeled through a limited-bandwidth bus, and optimized through a contention-aware scheduler (CAS). Relying on these models, a genetic algorithm is developed to optimally schedule different DNN layers across the different cores. The impact of our enhanced modeling, core allocation and scheduling capabilities is shown in several experiments and demonstrates a decrease of 52% resp. 38% in latency, resp. energy when mapping a multi-DNN inference, consisting of ResNet-18, MobileNet-V2 and Tiny YOLO V2, on a heterogeneous multi-core platform compared to iso-area homogeneous architectures.

查看译文

关键词

deep learning accelerators,layer fusion,heterogeneous multi-core,genetic algorithm

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要