Binarizing Documents by Leveraging both Space and Frequency
arxiv(2024)
摘要
Document Image Binarization is a well-known problem in Document Analysis and
Computer Vision, although it is far from being solved. One of the main
challenges of this task is that documents generally exhibit degradations and
acquisition artifacts that can greatly vary throughout the page. Nonetheless,
even when dealing with a local patch of the document, taking into account the
overall appearance of a wide portion of the page can ease the prediction by
enriching it with semantic information on the ink and background conditions. In
this respect, approaches able to model both local and global information have
been proven suitable for this task. In particular, recent applications of
Vision Transformer (ViT)-based models, able to model short and long-range
dependencies via the attention mechanism, have demonstrated their superiority
over standard Convolution-based models, which instead struggle to model global
dependencies. In this work, we propose an alternative solution based on the
recently introduced Fast Fourier Convolutions, which overcomes the limitation
of standard convolutions in modeling global information while requiring fewer
parameters than ViTs. We validate the effectiveness of our approach via
extensive experimental analysis considering different types of degradations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要