SPHot: Prediction of Hot Spots in Protein-RNA Complexes by Protein Sequence Information and Ensemble Classifier

IEEE ACCESS(2019)

引用 7|浏览10
暂无评分
摘要
RNA-binding hot spots are a small and complementary set of interfacial residues that contribute most to the binding energy of protein-RNA interfaces. As experimental methods for identifying hot spots are time-consuming, labor-intensive and costly, there is a great interest in computational approaches that can predict hot spots on a large scale. In this paper, we introduced a sequence-based method that used ensemble classifier to predict hot spots in protein-RNA complexes. We first employed three different sequence encoding schemes based on the physicochemical properties from the AAindex database, the amino acid substitution matrix (BLOSUM62), and the predicted relative accessible surface area. Based on these sequence features, 249 individual predictors were developed to identify hot spots using the radial basis function (RBF)-based support vector machine (SVM), sigmoid-based SVM, and k-nearest neighbor algorithm (k-NN), respectively. The combinations of these individual predictors by majority voting were explored in a comprehensive way and an ensemble vote classifier composed of 43 individual predictors were selected to construct the final ensemble classifier. The ensemble classifier outperformed the state-of-the-art computational methods, yielding an F1 score of 0.843 and AUC of 0.893 on the training set as well as F1 score of 0.814 and AUC of 0.842 on the test set. The data and source code are available on the web site http://bioinfo.ahu.edu.cn:8080/SPHot.
更多
查看译文
关键词
Protein-RNA complexes,hot spot,ensemble approach,protein sequence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要