Chrome Extension
WeChat Mini Program
Use on ChatGLM

Optimizing InterProScan representation generates a surprisingly good protein function prediction method

biorxiv(2022)

Cited 0|Views20
No score
Abstract
Motivation Automated protein Function Prediction (AFP) is an intensively studied topic. Most of this research focuses on methods that combine multiple data sources, while fewer articles look for the most efficient ways to use a single data source. Therefore, we wanted to test how different preprocessing methods and classifiers would perform in the AFP task when we process the output from the InterProscan (IPS). Especially, we present novel preprocessing methods, less used classifiers and inclusion of species taxonomy. We also test classifier stacking for combining tested classifier results. Methods are tested with in-house data and CAFA3 competition evaluation data. Results We show that including IPS localisation and taxonomy to the data improves results. Also the stacking improves the performance. Surprisingly, our best performing methods outperformed all international CAFA3 competition participants in most tests. Altogether, the results show how preprocessing and classifier combinations are beneficial in the AFP task. Contact petri.toronen(AT)helsinki.fi Supplementary information Supplementary text is available at the project web site and at the end of this document. ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
Key words
interproscan representation,protein,prediction
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined