The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: Mzspeclib and Mzpaf

Joshua Klein,Henry Lam，Juan Antonio Vizcaino,Eric W. Deutsch

ANALYTICAL CHEMISTRY（2024）

Cited 0|Views2

Abstract

Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary, and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications, with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for non-peptidic analytes.

Translated text

求助PDF

上传PDF

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Upload PDF to Generate Summary

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文介绍了由人类蛋白质组组织蛋白质组学标准计划（PSI）制定的两种标准化格式——mzSpecLib和mzPAF，用于编码质谱库和碎片离子峰注释，旨在提高质谱数据分析的通用性和兼容性。

【方法】：研究团队开发了一种灵活的数据模型mzSpecLib，用于编码质谱库的元数据，并支持不同序列化格式之间的转换。同时，开发了mzPAF标准，用于编码碎片离子峰注释，两者均采用PSI-MS控制词汇。

【实验】：文中未具体描述实验过程，但提到mzSpecLib和mzPAF标准主要用于蛋白质组学中的肽段，并具有对小分子的基本支持，未来可扩展到其他需要编码非肽类分析物质谱库的领域。文中未提及使用的数据集名称和结果。