Developing and Aligning a Detailed Controlled Vocabulary for Artwork

New Trends in Database and Information Systems(2022)

引用 1|浏览31
暂无评分
摘要
Controlled vocabularies have proved to be critical for data interoperability and accessibility. In the cultural heritage (CH) domain, description of artworks are often given as free text, thus making filtering and searching burdensome (e.g. listing all artworks of a specific type). Despite being multi-language and quite detailed, the Getty’s Art & Architecture Thesaurus –a de facto standard for describing artworks– has a low coverage for languages different than English and sometimes does not reach the required degree of granularity to describe specific niche artworks. We build upon the Italian Vocabulary of Artworks, developed by the Italian Ministry of Cultural Heritage (MIC) and a set of free text descriptions from ArCO, the knowledge graph of the Italian CH, to propose an extension of the Vocabulary of Artworks and align it to the Getty’s thesaurus. Our framework relies on text matching and natural language processing tools for suggesting candidate alignments between free text and terms and between cross-vocabulary terms, with a human in the loop for validation and refinement. We produce 1.166 new terms (31% more w.r.t. the original vocabulary) and 1.330 links to the Getty’s thesaurus, with estimated coverage of 21%.
更多
查看译文
关键词
Controlled vocabularies, Cultural heritage, String-matching, Semantic similarity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要