MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection
AAAI(2024)
摘要
State-of-the-art video object detection methods maintain a memory structure,
either a sliding window or a memory queue, to enhance the current frame using
attention mechanisms. However, we argue that these memory structures are not
efficient or sufficient because of two implied operations: (1) concatenating
all features in memory for enhancement, leading to a heavy computational cost;
(2) frame-wise memory updating, preventing the memory from capturing more
temporal information. In this paper, we propose a multi-level aggregation
architecture via memory bank called MAMBA. Specifically, our memory bank
employs two novel operations to eliminate the disadvantages of existing
methods: (1) light-weight key-set construction which can significantly reduce
the computational cost; (2) fine-grained feature-wise updating strategy which
enables our method to utilize knowledge from the whole video. To better enhance
features from complementary levels, i.e., feature maps and proposals, we
further propose a generalized enhancement operation (GEO) to aggregate
multi-level features in a unified manner. We conduct extensive evaluations on
the challenging ImageNetVID dataset. Compared with existing state-of-the-art
methods, our method achieves superior performance in terms of both speed and
accuracy. More remarkably, MAMBA achieves mAP of 83.7/84.6
with ResNet-101. Code is available at
https://github.com/guanxiongsun/video_feature_enhancement.
更多查看译文
关键词
memory bank,detection,aggregation,video,multi-level
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要