PepHarmony: A Multi-View Contrastive Learning Framework for Integrated Sequence and Structure-Based Peptide Encoding
CoRR(2024)
摘要
Recent advances in protein language models have catalyzed significant
progress in peptide sequence representation. Despite extensive exploration in
this field, pre-trained models tailored for peptide-specific needs remain
largely unaddressed due to the difficulty in capturing the complex and
sometimes unstable structures of peptides. This study introduces a novel
multi-view contrastive learning framework PepHarmony for the sequence-based
peptide encoding task. PepHarmony innovatively combines both sequence- and
structure-level information into a sequence-level encoding module through
contrastive learning. We carefully select datasets from the Protein Data Bank
(PDB) and AlphaFold database to encompass a broad spectrum of peptide sequences
and structures. The experimental data highlights PepHarmony's exceptional
capability in capturing the intricate relationship between peptide sequences
and structures compared with the baseline and fine-tuned models. The robustness
of our model is confirmed through extensive ablation studies, which emphasize
the crucial roles of contrastive loss and strategic data sorting in enhancing
predictive performance. The proposed PepHarmony framework serves as a notable
contribution to peptide representations, and offers valuable insights for
future applications in peptide drug discovery and peptide engineering. We have
made all the source code utilized in this study publicly accessible via GitHub
at https://github.com/zhangruochi/PepHarmony or
http://www.healthinformaticslab.org/supp/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要