谷歌浏览器插件
订阅小程序
在清言上使用

Causes and Analytical Impacts of Missing Data in RADseq Phylogenetics: Insights from an African Frog (Afrixalus)

Zoologica scripta(2019)

引用 34|浏览11
暂无评分
摘要
Restriction site‐associated DNA sequencing (RADseq) has emerged as a useful tool in systematics and population genomics. A common feature of RADseq data sets is that they contain missing data that arise from multiple sources including genealogical sampling bias, assembly methodology and sequencing error. Many RADseq studies have demonstrated that allowing sites (single nucleotide polymorphisms, SNPs) with missing data can increase support for phylogenetic hypotheses. Two non‐mutually exclusive explanations for this observation are that (a) larger data sets contain more phylogenetic information; and (b) excluding missing data disproportionally removes sites with the highest mutation rates, causing the exclusion of characters that are likely variable and informative. Using a RADseq data set derived from the East African banana frog, Afrixalus fornasini (up to 1.1 million SNPs), we found that missing data thresholds were positively correlated with the proportion of parsimony‐informative sites and mean branch support. Using three proxies for estimating site‐specific rate, we found that the most conservative missing data strategies excluded rapidly evolving sites, with four‐state sites present only when allowing ≥60% missing data per SNP. Topological similarity among estimated phylogenies was highest for the data sets with ≥60% missing data per SNP. Our results suggest that several desirable phylogenetic qualities were observed when allowing ≥60% missing data per SNP. However, at the highest missing data thresholds (80% and 90% missing data per SNP), we observed differences in performance between high‐ and mixed‐weight DNA extraction samples, which may indicate there are trade‐offs to consider when using degraded genomic template with RADseq protocols.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要