Using syntactic information to identify plagiarism

EdAppsNLP 05: Proceedings of the second workshop on Building Educational Applications Using NLP（2005）

引用 79|浏览526

暂无评分

摘要

Using keyword overlaps to identify plagiarism can result in many false negatives and positives: substitution of synonyms for each other reduces the similarity between works, making it difficult to recognize plagiarism; overlap in ambiguous keywords can falsely inflate the similarity of works that are in fact different in content. Plagiarism detection based on verbatim similarity of works can be rendered ineffective when works are paraphrased even in superficial and immaterial ways. Considering linguistic information related to creative aspects of writing can improve identification of plagiarism by adding a crucial dimension to evaluation of similarity: documents that share linguistic elements in addition to content are more likely to be copied from each other. In this paper, we present a set of low-level syntactic structures that capture creative aspects of writing and show that information about linguistic similarities of works improves recognition of plagiarism (over tfidf-weighted keywords alone) when combined with similarity measurements based on tfidf-weighted keywords.

查看译文

关键词

creative aspect,tfidf-weighted keyword,linguistic similarity,plagiarism detection,similarity measurement,verbatim similarity,linguistic information,share linguistic element,ambiguous keyword,crucial dimension,syntactic information

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要