Classifying Big DNA Methylation Data: A Gene-Oriented Approach.

Emanuel Weitschek,Fabio Cumbo,Eleonora Cappelli,Giovanni Felici,Paola Bertolazzi

Communications in Computer and Information Science（2018）

引用 6|浏览23

暂无评分

摘要

Thanks to Next Generation Sequencing (NGS) techniques, public available genomic data of cancer is growing quickly. Indeed, the largest public database of cancer called The Cancer Genome Atlas (TCGA) contains huge amounts of biomedical big data to be analyzed with advanced knowledge extraction methods. In this work, we focus on the NGS experiment of DNA methylation, whose data matrices are composed of hundred thousands of features (i.e., methylated sites). We propose an efficient data processing procedure that permits to obtain a gene-oriented organization and enables to perform a supervised machine learning analysis with state-of-the-art methods. The procedure divides the original data matrices into several sub-matrices, each one containing the sites located within the same gene. We extract from TCGA DNA methylation data of three tumor types (i.e.,breast, prostate, and thyroid carcinomas) and we are able to successfully discriminate tumoral from non tumoral samples using function-, tree-, and rule-based classifiers. Finally, we select the best performing genes (matrices) ranking them according to the accuracy of the classifiers and we execute an enrichment analysis of them. Those genes can be further investigated by domain experts for proving their relation to the cancers under study.

查看译文

关键词

Classification,DNA methylation,Cancer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要