Digitizing History: Transitioning Historical Paper Documents to Digital Content for Information Retrieval and Mining-A Comprehensive Survey

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS(2024)

引用 0|浏览0
暂无评分
摘要
Historical document processing (HDP) corresponds to the task of converting the physical-bind form of historical archives into a web-based centrally digitized form for their conservation, preservation, and ubiquitous access. Besides the conservation of these invaluable historical collections, the key agenda is to make these geographically distributed historical repositories available for information mining and retrieval in a web-centralized touchless mode. Being a matter of interest for interdisciplinary scholars, the endeavor has garnered the attention of many researchers resulting in an immense body of the literature dedicated to digitization strategies. The present study first assembles the prevalent tasks essential for HDP into a pipeline and frames an outline for a generic workflow for historical document digitization. Then, it reports the latest task-specific state of the art which gives a brief discourse on the methods and open challenges in handling historical printed and handwritten script images. Next, grounded on various layout attributes, it further talks about the evaluation metrics and datasets available for observational and analytical purposes. The current study is an attempt to trail the contours of undergoing research and its bottlenecks thus, providing readers with a comprehensive view and understanding of existing studies and unfolding the open avenues for the future outlook.
更多
查看译文
关键词
Archival images,handwritten,historical document,image processing,information retrieval,layout analysis,natural language processing,printed
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要