A Spark-Based Open Source Framework for Large-Scale Parallel Processing of Rich Text Documents

Qiang Chen,Yinong Chen,Sheng Wu,Zili Zhang

2021 8th International Conference on Future Internet of Things and Cloud (FiCloud)（2021）

引用 0|浏览3

暂无评分

摘要

The large amount of unstructured educational data is an important data source of educational big data analysis. However, there is a lack of efficient and simple distributed solutions to process these unstructured documents. To address this problem, this study proposes a Spark-based parallel processing framework of large-scale rich text documents (mainly as MS Word documents), which can seamlessly ...

查看译文

关键词

educational big data,rich text document processing,spark,distributed framework,hdfs,open source

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要