Scalable Learning Of K-Dependence Bayesian Classifiers Under Mapreduce

TRUSTCOM-BIGDATASE-ISPA '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 02(2015)

引用 8|浏览17
暂无评分
摘要
In Data Mining there is a constant need to provide more scalable tools in order to tackle new domains with an increased level of complexity. Over the last few years one of the main challenges in this field is the growing size of the available data; owing to the level of data generation and storage capacities provided by new emergent technology, a range of new computational paradigms and parallel architectures have been proposed. MapReduce got the leading role in the field of Big Data applications since its appearance, and many popular Data Analysis tools and techniques have been successfully adapted to this paradigm.Supervised classification is one of the most common problems in Data Mining, and Bayesian Networks Classifiers (BNC) have become one of the most extended and competitive techniques to approach them. In this paper we propose a parallel definition of the KDB (k-dependence Bayesian classifier) algorithm under the MapReduce framework. We focus on obtaining maximum scalability and flexibility by exploring the concepts of vertical and horizontal parallelism, thus addressing both Big Data and High Dimensional problems simultaneously. We analyse its properties and the advantages of applying it to large datasets of different nature. Finally, an experimental evaluation is performed by testing a Hadoop implementation of our proposal on a high-end cluster of computers.
更多
查看译文
关键词
high-end computer cluster,Hadoop implementation,high dimensional problems,horizontal parallelism,vertical parallelism,KDB,BNC,supervised classification,data analysis tools,Big Data applications,parallel architectures,computational paradigm,storage capacity,data generation,data mining,MapReduce,k-dependence Bayesian network classifier,scalable learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要