An Orthographic Similarity Measure for Graph-Based Text Representations

FLEXIBLE QUERY ANSWERING SYSTEMS, FQAS 2023(2023)

引用 0|浏览6
暂无评分
摘要
Computing the orthographic similarity between words, sentences, paragraphs and texts has become a basic functionality of many text mining and flexible querying systems and the resulting similarity scores are often used to discover similar text documents. However, when dealing with a corpus that is inherently known for its orthographic inconsistencies and intricate interconnected nature on multiple levels (words, verses and full texts), as is the case with Byzantine book epigrams, this task becomes complex. In this paper, we propose a technique that tackles these two challenges by representing text in a graph and by computing a similarity score between multiple levels of the text, modelled as subgraphs, in a hierarchical manner. The similarity between all words is computed first, followed by the calculation of the similarity between all verses (resp. full texts) by using the formerly determined similarity scores between the words (resp. verses). The resulting similarities, on each level, allow for a deeper insight into the interconnected nature in (parts of) text collections, indicating how and to what degree the texts are related to each other.
更多
查看译文
关键词
Text Analysis,Orthographic Similarity,Graph Databases,Fuzzy Graphs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要