scFlow: A Scalable and Reproducible Analysis Pipeline for Single-Cell RNA Sequencing Data

biorxiv(2021)

引用 6|浏览2
暂无评分
摘要
Advances in single-cell RNA-sequencing technology over the last decade have enabled exponential increases in throughput: datasets with over a million cells are becoming commonplace. The burgeoning scale of data generation, combined with the proliferation of alternative analysis methods, led us to develop the scFlow toolkit and the nf-core/scflow pipeline for reproducible, efficient, and scalable analyses of single-cell and single-nuclei RNA-sequencing data. The scFlow toolkit provides a higher level of abstraction on top of popular single-cell packages within an R ecosystem, while the nf-core/scflow Nextflow pipeline is built within the nf-core framework to enable compute infrastructure-independent deployment across all institutions and research facilities. Here we present our flexible pipeline, which leverages the advantages of containerization and the potential of Cloud computing for easy orchestration and scaling of the analysis of large case/control datasets by even non-expert users. We demonstrate the functionality of the analysis pipeline from sparse-matrix quality control through to insight discovery with examples of analysis of four recently published public datasets and describe the extensibility of scFlow as a modular, open-source tool for single-cell and single nuclei bioinformatic analyses. ### Competing Interest Statement PMM has received consultancy fees from Roche, Adelphi Communications, Celgene, Neurodiem and Medscape. He has received honoraria or speakers' fees from Novartis and Biogen and has received research or educational funds from Biogen, Novartis and GlaxoSmithKline
更多
查看译文
关键词
rna,reproducible analysis pipeline,single-cell
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要