Stemming Algorithm for Arabic Text Using a Parallel Data Processing

THIRD INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY（2019）

引用 2|浏览0

暂无评分

摘要

The fast-growing data generated by the network, Faced data mining algorithms to the big difficulties namely the outlook of storing data and handling computational challenges related to the volume of data and scalability. Our interest is focused on analyzing data sets in Arabic language faced to the morphological complexities and dialectal varieties, and those specificities require advanced preprocessing steps typically stemming word step. In this paper to complete a successful Arabic information retrieval, we use the MapReduce model and then perform experiments on rankings generated by our optimized stemming algorithm based on Khoja algorithm, the popular algorithm in stemming Arabic words. We propose a structure based on key and value pair to speed up stemming phase and parallelize the process using MapReduce mechanism.

查看译文

关键词

Information retrieval,Hadoop,MapReduce,Big data,Khoja stemmer,Arabic text

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要