Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections

2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)(2021)

引用 1|浏览10
暂无评分
摘要
The uncertainty caused by optical character recognition (OCR) noise has been a primary barrier for digital libraries (DL) to promote their curated datasets for research purposes, particularly when the datasets are fed into advanced language models with less transparency. To shed some light on this issue, this study evaluates the impacts of OCR noise on BERT models for encoding the intrinsic semant...
更多
查看译文
关键词
Uncertainty,Semantics,Bit error rate,Coherence,Gain measurement,Libraries,Encoding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要