Lightweight Scene Text Recognition Based on Transformer.

Xin Luan, Jinwei Zhang,Miaomiao Xu,Wushouer Silamu,Yanbing Li

Sensors(2023)

引用 0|浏览0
暂无评分
摘要
Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder-decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load.
更多
查看译文
关键词
scene text recognition,transformer,attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要