An Online Approach for DNA Sequencing Error Correction via Disk Based Index ∗


引用 0|浏览3
DNA sequencing has been widely used in many biological studies such as gene expression analysis and biomedical diagnostics. To solve the problem of sequencing errors produced by a sequencer, researchers perform error correction as a first step in sequence analysis. To overcome the scalability problem of existing memory based error correction methods, a disk based sequencing error correction method, called the DiskBQcor, was recently proposed. It utilizes a disk based index tree to store k -mers from sequencing reads. It then analyzes the results of special box queries run on the index tree to efficiently correct sequencing errors. As an offline approach, the DiskBQcor corrects errors after all the k-mers for a given genome dataset are inserted into the index tree. In this paper, we present an online approach, called the DiskBQcor∗, for sequencing error correction to better utilize computing resources. It extends the DiskBQcor by introducing an online analysis process during which sequencing errors are identified and corrected in an online fashion. The relevant correction algorithms, statistical measures, and error identifying strategies are discussed. Our experiments demonstrate that the proposed online method is quite promising in error correction for sequencing genome data on disk. keywords: DNA sequencing error correction, diskbased index structure, box query, online method.
AI 理解论文
Chat Paper