Text Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the Wild

Jiayu Xu, Ailiang Lin,Jinxing Li,Guangming Lu

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY(2024)

引用 0|浏览0
暂无评分
摘要
Over recent years, deep learning has significantly boosted scene text detection performance, and current segmentation-based scene text detectors can achieve compact bounding boxes for irregular texts. However, it is also challenging to tackle crowded or overlapping texts for these existing methods due to conglutination between adjacent text instances in segmentation results. To address these issues, we propose a more accurate scene text detector, Text Position-Aware Pixel Aggregation Network, termed TPPAN. Specifically, a Gaussian threshold representation is adaptively learned instead of a constant setting in Adaptively Text Kernel Thresholding (ATKT) module to obtain more accurate text kernels. Then Text Position-Aware Region Pixel Aggregation (TPAR-PA) module predicts the text regions in relative positions and generates more accurate text contours. Adequate experiments have demonstrated that the resulting detector has achieved state-of-the-art performance on multi-oriented and curved scene text benchmarks.
更多
查看译文
关键词
Kernel,Shape,Feature extraction,Transformers,Detectors,Deep learning,Pipelines,Arbitrary-shaped scene text detection,text instance segmentation,self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要