Managing Massive Multi-Dimensional Array Data With Tiledb - Invited Demo Paper -

Jacob Bolewski,Stavros Papadopoulos

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2017)

引用 3|浏览5
暂无评分
摘要
TileDB is a system for managing data that are naturally represented as dense or sparse multi-dimensional arrays. TileDB's primary goal is massive scale, where data need to be stored persistently at a low cost, while allowing rapid access during parallel computation. Contrary to traditional data management systems, TileDB is an embeddable C library that can easily be integrated with various higher-level programming languages and scientific computing tools. It supports persistent storage on various backends such as POSIX filesystems, HDFS, Amazon S3, and more. TileDB has been successfully used in genomics as the storage engine of GenomicsDB, a project maintained by the Intel Health and Life Sciences group that is currently integrated with the Broad Institute's GATK4. This paper reviews the data model, architecture and key design principles of TileDB, and outlines our future vision for using TileDB in important scientific applications.
更多
查看译文
关键词
Multi-dimensional arrays, scientific computing, storage engine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要