COATi: statistical pairwise alignment of protein-coding sequences

Juan Jose Garcia Mesa, Ziqi Zhu,Reed Austin Cartwright

bioRxiv (Cold Spring Harbor Laboratory)(2024)

引用 0|浏览6
暂无评分
摘要
Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequence artifacts and errors made during alignment reconstruction can impact downstream analyses leading to erroneous conclusions in comparative and functional genomic studies. For example, abiological frameshifts and early stop codons are common artifacts found in protein coding sequences that have been annotated in reference genomes. While such errors are eventually fixed in the reference genomes of model organisms, many genomes used by researchers contain these artifacts, and researchers often discard large amounts of data in comparative genomic studies to prevent artifacts from impacting results. To address this need, we present COATi, a statistical, codon-aware pairwise aligner that supports complex insertion-deletion models and can handle artifacts present in genomic data. COATi allows users to reduce the amount of discarded data while generating more accurate sequence alignments. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
statistical pairwise alignment,protein,sequences
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要