Classifying Big DNA Methylation Data: A Gene-Oriented Approach.

Communications in Computer and Information Science(2018)

引用 6|浏览23
暂无评分
摘要
Thanks to Next Generation Sequencing (NGS) techniques, public available genomic data of cancer is growing quickly. Indeed, the largest public database of cancer called The Cancer Genome Atlas (TCGA) contains huge amounts of biomedical big data to be analyzed with advanced knowledge extraction methods. In this work, we focus on the NGS experiment of DNA methylation, whose data matrices are composed of hundred thousands of features (i.e., methylated sites). We propose an efficient data processing procedure that permits to obtain a gene-oriented organization and enables to perform a supervised machine learning analysis with state-of-the-art methods. The procedure divides the original data matrices into several sub-matrices, each one containing the sites located within the same gene. We extract from TCGA DNA methylation data of three tumor types (i.e.,breast, prostate, and thyroid carcinomas) and we are able to successfully discriminate tumoral from non tumoral samples using function-, tree-, and rule-based classifiers. Finally, we select the best performing genes (matrices) ranking them according to the accuracy of the classifiers and we execute an enrichment analysis of them. Those genes can be further investigated by domain experts for proving their relation to the cancers under study.
更多
查看译文
关键词
Classification,DNA methylation,Cancer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要