谷歌浏览器插件
订阅小程序
在清言上使用

Automated Feature Engineering Improves Prediction of Protein–protein Interactions

Amino acids(2019)

引用 14|浏览26
暂无评分
摘要
Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php .
更多
查看译文
关键词
Protein–protein interactions,Human proteome,Graph,Sequence,Evolutionary features,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要