Comparative assessment of projection and clustering method combinations in the analysis of biomedical data

Research Square (Research Square)(2023)

引用 0|浏览0
暂无评分
摘要
Abstract Background Clustering on projected data is a common component of the analysis of biomedical research datasets. Among projection methods, principal component analysis (PCA) is the most commonly used. It focuses on the dispersion (variance) of the data, whereas clustering attempts to identify concentrations (neighborhoods) within the data. These may be conflicting aims. This report re-evaluates combinations of PCA and other common projection methods with common clustering algorithms. Methods PCA, independent component analysis (ICA), isomap, multidimensional scaling (MDS), and t-distributed stochastic neighborhood embedding (t-SNE) were combined with common clustering algorithms (partitioning: k-means, k-medoids, and hierarchical: single, Ward's, average linkage). Projections and clusterings were assessed visually by tessellating the two-dimensional projection plane with Voronoi cells and calculating common measures of cluster quality. Clustering on projected data was evaluated on nine artificial and five real biomedical datasets. Results None of the combinations always gave correct results in terms of capturing the prior classifications in the projections and clusters. Visual inspection of the results is therefore essential. PCA was never ranked first, but was consistently outperformed or equaled by neighborhood-based methods such as t-SNE or manifold learning techniques such as isomap. Conclusions The results do not support PCA as the standard projection method prior to clustering. Instead, several alternatives with visualization of the projection and clustering results should be compared. A visualization is proposed that uses a combination of Voronoi tessellation of the projection plane according to the clustering with a color coding of the projected data points according to the prior classes. This can be used to find the best combination of data projection and clustering in a given in a given data set.
更多
查看译文
关键词
projection,analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要