谷歌浏览器插件
订阅小程序
在清言上使用

Data Matrix Normalization and Merging Strategies Minimize Batch-specific Systemic Variation in scRNA-Seq Data

biorxiv(2021)

引用 2|浏览3
暂无评分
摘要
Single-cell RNA sequencing (scRNA-seq) can reveal accurate and sensitive RNA abundance in a single sample, but robust integration of multiple samples remains challenging. Large-scale scRNA-seq data generated by different workflows or laboratories can contain batch-specific systemic variation. Such variation challenges data integration by confounding sample-specific biology with undesirable batch-specific systemic effects. Therefore, there is a need for guidance in selecting computational and experimental approaches to minimize batch-specific impacts on data interpretation and a need to empirically evaluate the sources of systemic variation in a given dataset. To uncover the contributions of experimental variables to systemic variation, we intentionally perturb four potential sources of batch-effect in five human peripheral blood samples. We investigate sequencing replicate, sequencing depth, sample replicate, and the effects of pooling libraries for concurrent sequencing. To quantify the downstream effects of these variables on data interpretation, we introduced a new scoring metric, the Cell Misclassification Statistic (CMS), which identifies losses to cell type fidelity that occur when merging datasets of different batches. CMS reveals an undesirable overcorrection by popular batch-effect correction and data integration methods. We show that optimizing gene expression matrix normalization and merging can reduce the need for batch-effect correction and minimize the risk of overcorrecting true biological differences between samples. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
batch-specific,scrna-seq
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要