High Throughput FPGA-Based Object Detection via Algorithm-Hardware Co-Design

ACM Transactions on Reconfigurable Technology and Systems(2024)

引用 0|浏览10
暂无评分
摘要
Object detection and classification is a key task in many computer vision applications such as smart surveillance and autonomous vehicles. Recent advances in deep learning have significantly improved the quality of results achieved by these systems, making them more accurate and reliable in complex environments. Modern object detection systems make use of lightweight convolutional neural networks (CNNs) for feature extraction, coupled with single-shot multi-box detectors (SSDs) that generate bounding boxes around the identified objects along with their classification confidence scores. Subsequently, a non-maximum suppression (NMS) module removes any redundant detection boxes from the final output. Typical NMS algorithms must wait for all box predictions to be generated by the SSD-based feature extractor before processing them. This sequential dependency between box predictions and NMS results in a significant latency overhead and degrades the overall system throughput, even if a high-performance CNN accelerator is used for the SSD feature extraction component. In this paper, we present a novel pipelined NMS algorithm that eliminates this sequential dependency and associated NMS latency overhead. We then use our novel NMS algorithm to implement an end-to-end fully pipelined FPGA system for low-latency SSD-MobileNet-V1 object detection. Our system, implemented on an Intel Stratix 10 FPGA, runs at 400 MHz and achieves a throughput of 2,167 frames per second with an end-to-end batch-1 latency of 2.13 ms. Our system achieves 5.3× higher throughput and 5× lower latency compared to the best prior FPGA-based solution with comparable accuracy.
更多
查看译文
关键词
FPGA accelerator,object detection,algorithm-hardware co-design,neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要