How Much Storage Precision Can Be Lost: Guidance for Near-Lossless Compression of Untargeted Metabolomics Mass Spectrometry Data.

Journal of proteome research(2024)

引用 0|浏览0
暂无评分
摘要
Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files: five relative errors for intensities and five absolute errors for m/z values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared Precision, Recall, F1 - score, and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute m/z error of 10-4 and a relative intensity error of 2 × 10-2, adhering to a 5% error threshold (F1 - scores above 95%). For a stricter 1% error threshold (F1 - scores above 99%), an absolute m/z error of 2 × 10-5 and a relative intensity error of 2 × 10-3 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要