GrootVL: Tree Topology is All You Need in State Space Model
arxiv(2024)
Abstract
The state space models, employing recursively propagated features,
demonstrate strong representation capabilities comparable to Transformer models
and superior efficiency. However, constrained by the inherent geometric
constraints of sequences, it still falls short in modeling long-range
dependencies. To address this issue, we propose the GrootVL network, which
first dynamically generates a tree topology based on spatial relationships and
input features. Then, feature propagation is performed based on this graph,
thereby breaking the original sequence constraints to achieve stronger
representation capabilities. Additionally, we introduce a linear complexity
dynamic programming algorithm to enhance long-range interactions without
increasing computational cost. GrootVL is a versatile multimodal framework that
can be applied to both visual and textual tasks. Extensive experiments
demonstrate that our method significantly outperforms existing structured state
space models on image classification, object detection and segmentation.
Besides, by fine-tuning large language models, our approach achieves consistent
improvements in multiple textual tasks at minor training cost.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined