Linking Between Molecular and Biodiversity Data: A BiCIKL Perspective

Joana Paupério, Vikas Gupta,Vishnukumar Balavenkataraman Kadhirvelu,Kessy Abarenkov,Wouter Addink,Donat Agosti,Olaf Bánki,Josephine Burgin,Marcus Ernst,Tobias Frøslev,Quentin Groom, Anton Güntsch,Suran Jayathilaka, Sam Leeflang,Urmas Kõljalg, Joe Miller, Guido Sautter,Lyubomir Penev,Guy Cochrane

Biodiversity Information Science and Standards（2024）

引用 0|浏览0

暂无评分

摘要

Molecular sequencing data generation is being driven by global and regional efforts to discover, understand and monitor biodiversity. To fully explore this data in biodiversity research we need a network of connected data resources, linking sequence data with natural history collections, taxonomy and literature. The BiCIKL project (Biodiversity Community Integrated Knowledge Library, Penev et al. 2022) has set the groundwork towards creating this network of linked data and fostering FAIR (Findable, Accessible, Interoperable and Reusable) practices in the biodiversity domain. Connecting biodiversity and molecular data along the biodiversity research cycle requires a foundation of well-structured and rich metadata in the molecular sequence databases. Referencing the physical specimens is important as this provides context about the source of the material that was used for generating the molecular sequence data, including information about origin and species identification. To connect biodiversity and molecular data, we developed tools and workflows for improving and standardising metadata, federated searches and validations for specimen reference in sequence data, such as the SpASe tool, which enables the discovery of links between natural history collections and sequences, and the European Nucleotide Archive Source Attribute Helper API, which facilitates the construction of specimen attributes in a structured format. This work was done in close collaboration with DiSSCo (Distributed System of Scientific Collections) and some biodiversity genomics projects (e.g. Biodiversity Genomics Europe, BGE). Furthermore, we enabled community curation of biological source annotations such as specimen references in sequence data through the PlutoF platform and the ELIXIR Contextual Data Clearinghouse (Abarenkov et al. 2021, Balavenkataraman Kadhirvelu et al. 2022) and increased bidirectional linking from sequences in the European Nucleotide Archive (ENA) to collections, taxonomy and literature services (e.g., Plazi TreatmentBank, OpenBioDiv). We also worked closely with the community to enable the structured publication of environmental DNA data, promoting and engaging in the definition of standards and developing tools to facilitate data deposition and retrieval. Overall, the project has contributed significantly to strengthen the connections between the biodiversity and genomics communities towards higher data integration and interoperability. Structured, enriched, accessible and linked sequence data will provide a strong foundation for the application of biodiversity knowledge in the response to global challenges, such as biodiversity loss, ecosystem change and food security. Beyond BiCIKL, we will continue our work as a community to promote a culture of FAIR linked molecular data, towards a fully integrated biodiversity knowledge ecosystem.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要