谷歌浏览器插件
订阅小程序
在清言上使用

Abstract 1659: Overcorrection of Batch Effects by ComBat Can Be Avoided by Using an Equal Medians Method

John C. Obenauer, Thomas P. Stockfisch,Marcia V. Fournier

Bioinformatics, Convergence Science, and Systems Biology(2019)

引用 1|浏览2
暂无评分
摘要
Combining multiple data sets from the Gene Expression Omnibus (GEO) or other data repositories for an integrated analysis requires appropriate batch correction. ComBat, an empirical Bayesian method for batch correction of microarray data, is widely used and has been reported to be the best correction method. We combined cancer data from 16 public studies representing 8 tissue types and a total of 3,563 samples, used the R “sva” package and ComBat for batch correction, and examined 6 gene sets representing positive and negative controls. As positive controls, we extracted 4 gene sets from the Human Protein Atlas that were found to be expressed at least 5-fold higher in one tissue than in any of 35 other tissues, and we matched these genes to their Affymetrix U133A probesets. This resulted in 16 probesets specific for stomach, 18 for lung, 37 for pancreas, and 27 for prostate. A fifth positive control is a group of 85 genes called BA80 that we have found to be expressed much lower in blood than in solid tissues. As a negative control that we do not expect to change much between tissues, we used a list of 3,804 housekeeping (HK) genes that were reported to show less than a four-fold expression change across 16 tissue types. We compared the ComBat results to a new method we call equal medians. The equal medians method assumes that the 22,277 genes measured on the Affymetrix U133A microarrays can vary widely between tissues and batches, but that the median of the 22,277 genes is the same for every sample. We created boxplots of each gene set across the 16 studies before and after each method of batch correction. The reduction in batch effects was scored using the change in standard deviation of the HK genes. The preservation of biological variability was scored using the fold change of the positive controls, comparing the target tissue’s median to the nearest alternate tissue’s median. We used two GEO studies as independent representatives of each tissue type, so the two fold changes were averaged to create a single measure.The results using the HK genes showed that ComBat removed 99.90% of the batch effects visible in the raw data, while equal medians removed 61.58%. However, equal medians did the best at preserving biological variability, with a fold change of 4.8 for stomach, 13.1 for lung, 42.3 for pancreas, 12.0 for prostate, and 3.9 for blood. The corresponding fold changes for ComBat were 1.4, 1.1, 2.2, 1.0, and 1.0.We conclude that ComBat was best at removing batch effects, but at the undesirable cost of minimizing biological variation. We believe this is due to known and unknown sources of variability that are confounded with batches, which is one of ComBat’s known risks. Equal medians showed the opposite performance, preserving biological variation better while partially removing batch effects. We offer the equal medians method as an alternative batch correction method in cases where ComBat shows evidence of overcorrection.Citation Format: John C. Obenauer, Thomas P. Stockfisch, Marcia V. Fournier. Overcorrection of batch effects by ComBat can be avoided by using an equal medians method [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1659.
更多
查看译文
关键词
Gene Set Enrichment Analysis,Gene Expression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要