Making Legacy Digital Content Accessible At Source

Sankalan Pal Chowdhary, Dipendra Manocha,Meenakshi Balakrishnan,Akashdeep Bansal,Himanshu Garg

16TH INTERNATIONAL WEB FOR ALL CONFERENCE (WEB4ALL)（2019）

引用 1|浏览2

暂无评分

摘要

Nearly three decades have passed since the Unicode standard was first published in 1991. A lot of electronically generated content is still locked inside legacy encodings. This can be attributed to lack of software support for Unicode at the time of content creation. The target output originally was print and this went on unnoticed. Later, to meet the growing demand for digital content, the same content in legacy encodings had to be exported to unsearchable PDF's/EPUB's. Conversion to Unicode has been a challenge because digital publishing applications cannot provide built in conversion support for the multitude of legacy encodings. Conversion tools, even where available, are external to the source application and require manual effort not only for text export/import but also for correcting errors in conversion and changes in document layout. We have tried to address this problem for Devanagari script where the digital publishing application is InDesign[1] or PageMaker and the textual content is in the form of text and not images. JnDesign allows import of PageMaker documents and InDesign's scripting allows access and modification of the document content directly. The tools have been successfully used to convert legacy Devanagari content from 5 distinct legacy encodings in over 100 textbooks meant for K-12 schools in India.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要