谷歌浏览器插件
订阅小程序
在清言上使用

Reference-based data compression for genome in cloud

ICCIP(2016)

引用 2|浏览5
暂无评分
摘要
In this paper, we propose a new reference-based data compression method for efficient compressing of genome sequencing data in FASTQ format. With the advance of the next sequencing technology, the genome data can be generated faster and cheaper, which brings the challenges for efficient storage of these data when used in cloud computing. In order to efficiently store these types of genome data in cloud, content-aware compressing methods have to be developed to make use of the specific file structures. Compared with existing genome-specific compression methods, our proposed content-aware method focused on high compression ratio by taking advantages of repetitive nature of DNA sequence, and using reference genomes in compressing the sequences inside the FASTQ files. The benchmark results of 8 datasets show that our method can achieve highest compression ratio compared with existing FASTQ file compressors.
更多
查看译文
关键词
data compression,genome,reference-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要