A Spark-Based Open Source Framework for Large-Scale Parallel Processing of Rich Text Documents

2021 8th International Conference on Future Internet of Things and Cloud (FiCloud)(2021)

引用 0|浏览3
暂无评分
摘要
The large amount of unstructured educational data is an important data source of educational big data analysis. However, there is a lack of efficient and simple distributed solutions to process these unstructured documents. To address this problem, this study proposes a Spark-based parallel processing framework of large-scale rich text documents (mainly as MS Word documents), which can seamlessly ...
更多
查看译文
关键词
educational big data,rich text document processing,spark,distributed framework,hdfs,open source
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要