Information entropy-based differential evolution with extremely randomized trees and LightGBM for protein structural class prediction.

Appl. Soft Comput.(2023)

引用 3|浏览5
暂无评分
摘要
The discovery of protein tertiary structure is the basis of current genetic engineering, medicinal design, and other biological applications. Protein structural class plays a significant role in the tertiary structure folding and function analysis of protein. However, the growth rate of new amino acid sequence far exceeds the tertiary structure. Existing research methods of confirming protein folding cannot satisfy massive sequences and protein engineering. A high-accuracy prediction result of low-similarity protein dataset is particularly critical to generate the corresponding tertiary structure from the primary structure. In this paper, we construct a novel super-large-scale feature of the primary structure based on secondary structure, evolutionary information, chemical properties, and global descriptors. The diversified and massive features are utilized to predict the protein class based on a novel feature selection algorithm and a gradient boosting decision tree model. To testify the effectiveness and robustness of our proposed method, namely IDEGBM, we choose the 10-fold cross-validation for evaluating four benchmark datasets 25PDB, FC699, D1189 and D640. Experimental results exhibit that our method improves the accuracy in comparison with other state-of-the-art prediction models in terms of both accuracy and efficiency. Furthermore, a representative protein is used to validate that our proposed IDEGBM can be applied to improve the conformation prediction of protein tertiary structure.
更多
查看译文
关键词
Protein structural class, Prediction model, Feature selection, Evolutionary algorithm, Single objective optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要