Spreading fuzzy random forests with MapReduce

2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC)(2016)

引用 25|浏览22
暂无评分
摘要
Random forests are currently considered among the most accurate and efficient classifiers. Moreover, recently fuzzy implementations of random forests have been proposed to exploit the ability of fuzzy decision trees to cope with uncertain data. Whenever the size of training sets grows substantially, as it happens in the case of Big Data, ordinary implementations of classifiers become inadequate, and fuzzy random forests make no exception. In this paper, we consider a method, which generates fuzzy partitions of the continuous attributes along the decision tree learning, and we propose a distributed implementation of fuzzy random forests based on this method. The implementation relies on the MapReduce programming model and the Apache Hadoop framework. It is shown that such a model can easily accommodate an effective distribution strategy for the computation, yielding good scalability figures. The novel distributed algorithm makes fuzzy random forests able to deal with extremely large data sets, both in the learning and in the classification phases, thus fostering its applicability in the modern scenario of increasingly frequent data deluges.
更多
查看译文
关键词
Fuzzy Random Forests, MapReduce, Distributed Fuzzy Classifiers, Big Data, Fuzzy Decision Trees, Hadoop
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要