Estimating the Date of First Publication in a Large-Scale Digital Library.
JCDL(2017)
摘要
One prerequisite for cultural analysis in large-scale digital libraries is an accurate estimate of the date of composition of the text---as distinct from the date of publication of an edition---for the works they contain. In this work, we present a manually annotated dataset of first dates of publication of three samples of books from the HathiTrust Digital Library (uniform random, uniform fiction, and stratified by decade), and empirically evaluate the disparity between these gold standard labels and several approximations used in practice (using the date of publication as provided in metadata, several deduplication methods, and automatically predicting the date of composition from the text of the book). We find that a simple heuristic of metadata-based deduplication works best in practice, and text-based composition dating is accurate enough to inform the analysis of "apparent time."
更多查看译文
关键词
Digital libraries,bibliographic metadata,publication date prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络