Evaluation of Gene Set Enrichment Analysis (GSEA) tools highlights the value of single sample approaches over pairwise for robust biological discovery.

Courtney Bull, Ryan M Byrne,Natalie C Fisher,Shania M Corry,Raheleh Amirkhah, Jessica Edwards, Lily Hillson,Mark Lawler,Aideen Ryan, Felicity Lamrock,Philip D Dunne,Sudhir B Malla

biorxiv(2024)

引用 0|浏览3
暂无评分
摘要
Background: Gene set enrichment analysis (GSEA) tools can be used to identify biological insights from transcriptional datasets and have become an integral analysis within gene expression-based cancer studies. Over the years, additional methods of GSEA-based tools have been developed, providing the field with an ever-expanding range of options to choose from. Although several studies have compared the statistical performance of these tools, the downstream biological implications that arise when choosing between the range of pairwise or single sample forms of GSEA methods remain understudied. Methods: In this study, we compare the statistical and biological interpretation of results obtained when using a variety of pre-ranking methods and options for pairwise GSEA and fast GSEA (fGSEA), alongside single sample GSEA (ssGSEA) and gene set variation analysis (GSVA). These analyses are applied to a well-established cohort of n=215 colon tumour samples, using the clinical feature of cancer recurrence status, non-relapse (NR) and relapse (R), as an initial exemplar, in conjunction with the Molecular Signatures Database Hallmark gene sets. Results: Despite minor fluctuations in statistical performance, pairwise analysis revealed remarkably similar results when deployed using a range of gene pre-ranking methods or across a range of choices of GSEA versus fGSEA, with the same well-established prognostic signatures being consistently returned as significantly associated with relapse status. In contrast, when the same statistically significant signatures, such as Interferon Gamma Response, were assessed using ssGSEA and GSVA approaches, there was a complete absence of biological distinction between these groups (NR and R). Conclusions: Data presented here highlights how pairwise methods can overgeneralise biological enrichment within a group, assigning strong statistical significance to gene sets that may be inadvertently interpreted as equating to distinct biology. Importantly, single sample approaches allow users to clearly visualise and interpret statistical significance alongside biological distinction between samples within groups-of-interest; thus, providing a more robust and reliable basis for discovery research. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要