VADA: a Data-Driven Simulator for Nanopore Sequencing
arxiv(2024)
摘要
Nanopore sequencing offers the ability for real-time analysis of long DNA
sequences at a low cost, enabling new applications such as early detection of
cancer. Due to the complex nature of nanopore measurements and the high cost of
obtaining ground truth datasets, there is a need for nanopore simulators.
Existing simulators rely on handcrafted rules and parameters and do not learn
an internal representation that would allow for analysing underlying biological
factors of interest. Instead, we propose VADA, a purely data-driven method for
simulating nanopores based on an autoregressive latent variable model. We embed
subsequences of DNA and introduce a conditional prior to address the challenge
of a collapsing conditioning. We introduce an auxiliary regressor on the latent
variable to encourage our model to learn an informative latent representation.
We empirically demonstrate that our model achieves competitive simulation
performance on experimental nanopore data. Moreover, we show we have learned an
informative latent representation that is predictive of the DNA labels. We
hypothesize that other biological factors of interest, beyond the DNA labels,
can potentially be extracted from such a learned latent representation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要