Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps

IEEE/ACM Transactions on Computational Biology and Bioinformatics(2022)

引用 5|浏览9
暂无评分
摘要
The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem. We analyze a recent implementation of the FM-index and highlight existing throughput-memory trade-offs, showing that memory requirements limit implementation of large k-steps. We propose COFI, a CO mpressed F M- I ndex for large K-steps. COFI enables a 15-step FM-index using less than 16 GB for a human genome reference of 3 giga base pairs. An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX). We achieve average speed-ups of 1.46× and 1.39×, respectively, with respect to an state-of-the-art FM-index implementation that is already well optimized.
更多
查看译文
关键词
Algorithms,Genome, Human,Genomics,High-Throughput Nucleotide Sequencing,Humans,Sequence Alignment,Sequence Analysis, DNA,Software
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要