Correct machine learning on protein sequences: a peer-reviewing perspective.

Ian Walsh,Gianluca Pollastri,Silvio C E Tosatto

BRIEFINGS IN BIOINFORMATICS（2016）

引用 59|浏览76

暂无评分

摘要

Machine learning methods are becoming increasingly popular to predict protein features from sequences. Machine learning in bioinformatics can be powerful but carries also the risk of introducing unexpected biases, which may lead to an overestimation of the performance. This article espouses a set of guidelines to allow both peer reviewers and authors to avoid common machine learning pitfalls. Understanding biology is necessary to produce useful data sets, which have to be large and diverse. Separating the training and test process is imperative to avoid over-selling method performance, which is also dependent on several hidden parameters. A novel predictor has always to be compared with several existing methods, including simple baseline strategies. Using the presented guidelines will help nonspecialists to appreciate the critical issues in machine learning.

查看译文

关键词

machine learning,protein sequence,posttranslational modification,predictor,training,evaluation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要