scMARK an 'MNIST' like benchmark to evaluate and optimize models for unifying scRNA data

Swechha Singh, Dylan Mendonca,Octavian Focsa,Juan Javier Diaz-Mejia,Sam Cooper

biorxiv(2021)

引用 0|浏览1
暂无评分
摘要
Today's single-cell RNA analysis tools provide enormous value in enabling researchers to make sense of large single-cell RNA (scRNA) studies, yet their ability to integrate different studies at scale remains untested. Here we present a novel benchmark dataset (scMARK), that consists of 100,000 cells over 10 studies and can test how well models unify data from different scRNA studies. We also introduce a two-step framework that uses supervised models, to evaluate how well unsupervised models integrate scRNA data from the 10 studies. Using this framework, we show that the Variational Autoencoder, scVI, represents the only tool tested that can integrate scRNA studies at scale. Overall, this work paves the way to creating large scRNA atlases and 'off-the-shelf' analysis tools. ### Competing Interest Statement All authors are employees of Phenomic AI Inc., a company focused on developing new therapeutics against the tumor stroma. S.C. is a founder, shareholder, and board-member of Phenomic AI Inc.
更多
查看译文
关键词
scrna data,benchmark,‘mnist,optimize models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要