Progressive clustering of big data with GPU acceleration and visualization

Jun Wang,Eric Papenhausen,Bing Wang,Sungsoo Ha,Alla Zelenyuk,Klaus Mueller

2017 New York Scientific Data Summit (NYSDS)（2017）

引用 2|浏览6

暂无评分

摘要

Clustering has become an unavoidable step in big data analysis. It may be used to arrange data into a compact format, making operations on big data manageable. However, clustering of big data requires not only the capability of handling data with large volume and high dimensionality, but also the ability to process streaming data, all of which are less developed in most current algorithms. Furthermore, big data processing is seldom interactive, which stands at conflict with users who seek answers immediately. The best one can do is to process incrementally, such that partial and, hopefully, accurate results can be available relatively quickly and are then progressively refined over time. We propose a clustering framework which uses Multi-Dimensional Scaling for layout and GPU acceleration to accomplish these goals. Our domain application is the clustering of mass spectral data of individual aerosol particles with 8 million data points of 450 dimensions each.

查看译文

关键词

clustering,big data,GPU,visualization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要