Recommendations for extending the GFF3 specification for improved interoperability of genomic data

Surya Saha, Scott Cain, Ethalinda K. S. Cannon,Nathan Dunn, Andrew Farmer,Zhi-Liang Hu, Gareth Maslen,Sierra Moxon,Christopher J Mungall,Rex Nelson,Monica F. Poelchau

arxiv(2022)

引用 0|浏览3
暂无评分
摘要
The GFF3 format is a common, flexible tab-delimited format representing the structure and function of genes or other mapped features (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md). However, with increasing re-use of annotation data, this flexibility has become an obstacle for standardized downstream processing. Common software packages that export annotations in GFF3 format model the same data and metadata in different notations, which puts the burden on end-users to interpret the data model. The AgBioData consortium is a group of genomics, genetics and breeding databases and partners working towards shared practices and standards. Providing concrete guidelines for generating GFF3, and creating a standard representation of the most common biological data types would provide a major increase in efficiency for AgBioData databases and the genomics research community that use the GFF3 format in their daily operations. The AgBioData GFF3 working group has developed recommendations to solve common problems in the GFF3 format. We suggest improvements for each of the GFF3 fields, as well as the special cases of modeling functional annotations, and standard protein-coding genes. We welcome further discussion of these recommendations. We request the genomics and bioinformatics community to utilize the github repository (https://github.com/NAL-i5K/AgBioData_GFF3_recommendation) to provide feedback via issues or pull requests.
更多
查看译文
关键词
gff3 specification,genomic data,interoperability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要