Chrome Extension
WeChat Mini Program
Use on ChatGLM

MACE2K: A Text-Mining Tool to Extract Literature-based Evidence for Variant Interpretation using Machine Learning

biorxiv(2020)

Cited 0|Views7
No score
Abstract
Interpretation of a given variant’s pathogenicity is one of the most profound challenges to realizing the promise of genomic medicine. A large amount of information about associations between variants and diseases used by curators and researchers for interpreting variant pathogenicity is buried in biomedical literature. The development of text-mining tools that can extract relevant information from the literature will speed up and assist the variant interpretation curation process. In this work, we present a text-mining tool, MACE2k that extracts evidence sentences containing associations between variants and diseases from full-length PMC Open Access articles. We use different machine learning models (classical and deep learning) to identify evidence sentences with variant-disease associations. Evaluation shows promising results with the best F1-score of 82.9% and AUC-ROC of 73.9%. Classical ML models had a better recall (96.6% for Random Forest) compared to deep learning models. The deep learning model, Convolutional Neural Network had the best precision (75.6%), which is essential for any curation task. ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
Key words
variant interpretation,machine learning,text-mining,literature-based
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined