How much storage precision can be lost: Guidance for near-lossless compression of untargeted metabolomics mass spectrometry data

Journal of Proteome Research(2023)

引用 5|浏览11
暂无评分
摘要
Backgroud The size of high-resolution mass spectrometry (HRMS) data has been increasing significantly. Several lossy compressors have been developed for higher compression rate. Currently, a comprehensive evaluation of what and how MS data ( m/z and intensities) with precision losses would affect data processing is absent. Assessing the impact of different degrees of precision losses on the data processing results should clarify the variation rates under different accuracy losses and explore the reasons for them. Result Sixteen vendor files were converted to mzML files with a different combination of data precision (32- or 64-bit) for m/z and intensities via msConvert. A suitable precision combination of mzML files were afterwards converted to precision-lossy files with absolute m/z or relative intensities mistakes by truncation transformations. We set an error threshold at 1% to evaluate files results of feature and compound detection obtained from MZmine3. The variation was <0.13% for both features and compounds when m/z and intensities with different combinations of storage precision. Five maximum absolute errors of m/z (10−5, 2×10−5, 5×10−5, 10−4, 10−3) and five maximum relative errors of intensities (2×10−4, 2×10−3, 8×10−3, 2×10−2, 2×10−1) were examined. We identified that the error of 10−4 for m/z had a feature detection error of 0.57% and compound detection error of 1.1%. For intensities, the error group of 2×10−2 had an error of 4.65% for features and 0.98% for compounds to precision-lossless files. Taken together, we consider that a maximum absolute error of 10−4 for m/z and a maximum relative error of 2×10−2 for intensity can meet the error threshold of 1% and be recommended errors for lossy compression. Conclusion We examined that mzML files with both m/z and intensity encoded in 32-bit precision appear to be a preferred combination, which has smaller file size and minor variation. Further, we checked that how varying levels of precision affect the MS data processing and provided a reasonable scene-accuracy proposal (10−4 for m/z and 2×10−2 for intensities). This guidance aimed to help researchers in improving lossy compression algorithms and minimizing the negative effects of precision losses on downstream data processing. ![Figure][1] ### Competing Interest Statement The authors have declared no competing interest. [1]: pending:yes
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要