谷歌浏览器插件
订阅小程序
在清言上使用

Reading Scene Text by Fusing Visual Attention with Semantic Representations

International Multimedia Conference(2021)

引用 1|浏览39
暂无评分
摘要
Recognizing text in an unconstrained environment is a challenging task in computer vision. Many prevalent approaches to it employ a recurrent neural network that is difficult to train or rely heavily on sophisticated model designs for sequence modeling. In contrast to these methods, we propose a unified lexicon-free framework to enhance the accuracy of text recognition using only attention and convolution. We use a relational attention module to leverage visual patterns and word representations. To ensure that the predicted sequence captures the contextual dependencies within a word, we embed linguistic dependencies from a language model into the optimization framework. The proposed mutual attention model is an ensemble of visual cues and linguistic contexts that together improve performance. The results of experiments show that our system achieves state-of-the-art performance on datasets of texts from regular and irregular scenes. It also significantly enhances recognition performance on noisy scanned documents.
更多
查看译文
关键词
Text recognition,Multi-modal fusion,Convolutional neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要