Pre-processing Steps for Genome-wide High-density NARAC Dataset Facilitates its Haplotype Block Partitioning

Fatma S. Ibrahim,Mohamed Saad,Ashraf M. Said,Hesham F. A. Hamed

Journal of Advanced Engineering Trends（2021）

引用 0|浏览0

暂无评分

摘要

The pre-processing ‎ ‎ phase‎ is a crucial step to prepare any data for deep considerable ‎ analysis. ‎Genome-wide data ‎is considered ‎ big data; dealing with such data is not an easy task and still poses ‎a significant challenge. The ‎genome-wide association study (GWAS) ‎ is based on enormous high-‎density data with high throughput. This paper has illustrated the main pre-processing ‎ steps on data ‎from North American Rheumatoid Arthritis Consortium ‎‎(NARAC) for preparing it for haplotype ‎block partitioning using different methods and with different platforms. This paper’s main ‎objective is to summarize the steps of pre-processing the raw genotyped dataset to prepare it for ‎haplotype block partitioning and further analyses. Besides, we present each practical step by clear ‎tables for better visualizing, elucidation, and workflow interpretation. Besides, we aimed to ‎overcome the missing data and normalize the output in a standardized format. Eventually, this will ‎improve the understanding of such data formats and build the foundation stone of critical genome-wide experiments and studies. Thus, this work could a guide for other researchers who use similar ‎data. The pre-processed data will be applied to imputation, BigLD block partitioning under R and ‎Haploview methods. Our sequence of ‎pre-processing steps includes preparing the characters to be ‎in a form that is suitable for imputation. The next step is ‎recording data in 0,1,2 format to be ‎proper for the BigLD. We were finally preparing data for Haploview to ‎provide clear haplotype ‎block partitioning, association analysis, and furthermore.‎

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要