Ontological Optimization for Latent Semantic Indexing of Arabic Corpus.

Procedia Computer Science(2018)

引用 5|浏览16
暂无评分
摘要
The dimensionality reduction is a critical problem in the information retrieval process. The higher dimensions directly affect the search performance in terms of Recall and Precision. The dimensionality reduction enabling the search to be semantically based instead of lexically based as the dimensions are defined in terms of the semantic concepts instead of traditional terms or keywords. Latent Semantic Indexing (LSI) is a mathematical extension of the classical Vector Space Model (VSM). LSI is used to discover the latent semantic in the search space by extracting concepts from the original terms in the space. LSI is based on the Singular Value Decomposition (SVD) to reduce the dimension of the term space into a lower dimensional LSI space. In this paper, we propose a methodology for extra optimal LSI dimension reduction via two reduction levels. The first reduction level is based on an ontological conceptualization process. The Universal Wordnet ontology (UWN) is used to develop an ontological based concept space instead of the term space. As a second reduction level, the SVD is applied to the extracted concept space for getting an optimal LSI conceptualization. The experimental results of this research indicate an improvement in the search results in terms of both Precision and Recall as the proposed methodology addresses the Synonymy and Polysemy problems effectively.
更多
查看译文
关键词
LSI,Universal WordNet Ontology,Vector Models,Indexing,Weighting approaches,Dimensionality Reduction,Arabic Text
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要