BpMatch: an efficient algorithm for a segmental analysis of genomic sequences.

IEEE/ACM Trans. Comput. Biology Bioinform.(2012)

引用 4|浏览2
暂无评分
摘要
Here, we propose BpMatch: an algorithm that, working on a suitably modified suffix-tree data structure, is able to compute, in a fast and efficient way, the coverage of a source sequence S on a target sequence T, by taking into account direct and reverse segments, eventually overlapped. Using BpMatch, the operator should define a priori, the minimum length l of a segment and the minimum number of occurrences minRep, so that only segments longer than l and having a number of occurrences greater than minRep are considered to be significant. BpMatch outputs the significant segments found and the computed segment-based distance. On the worst case, assuming the alphabet dimension d is a constant, the time required by BpMatch to calculate the coverage is O(l²n). On the average, by setting l ≥ 2 log(d)(n), the time required to calculate the coverage is only O(n). BpMatch, thanks to the minRep parameter, can also be used to perform a self-covering: to cover a sequence using segments coming from itself, by avoiding the trivial solution of having a single segment coincident with the whole sequence. The result of the self-covering approach is a spectral representation of the repeats contained in the sequence. BpMatch is freely available on: www.sourceforge.net/projects/bpmatch.
更多
查看译文
关键词
bioinformatics,data structures,genomics,molecular biophysics,molecular configurations,trees (mathematics),BpMatch,alphabet dimension,direct segments,genomic sequence segmental analysis algorithm,reverse segments,segment based distance,self covering,source sequence coverage,suffix tree data structure,target sequence,Segmental analysis,coverage index.,genomic sequences,inverted repeats,repeats
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要