A Large Scale Document-Term Matching Method Based on Information Retrieval.

Jinchao Feng, Runbo Zhao,Jianguo Jiang

ISPA/BDCloud/SocialCom/SustainCom（2022）

引用 0|浏览20

暂无评分

摘要

With the increasing popularity of digital documents in the information era, large scale document-term matching is critical for document similarity tasks. However, the traditional automatic term matching method is mainly from the perspective of classification, which ignores the macroscopic and ambiguous characteristics of domain terms. This paper proposes a document-term matching method based on information retrieval (IR-DTM). We utilized an automatic data collection method, which can effectively alleviate the problem of insufficient data and greatly reduce the manual workload. Then, we realized semantic matching between macro terms and micro documents by extending the words in terms, thereby effectively improving the accuracy of document-term matching. Experimental results show that our method can perform effective automatic matching of domain terms and case documents. The effectiveness of IR-DTM is evaluated by R@1, R@5, and R@10. The R@10 percentage points on PolProv1.0, EcoProv1.0, and TechProv1.0 datasets can reach 84.6%, 92.5% and 93.2% respectively.

查看译文

关键词

document-term matching,unsupervised learning,automatic labeling,domain terms,case documents

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要