Time series data encoding in Apache IoTDB: comparative analysis and recommendation

The VLDB Journal(2024)

引用 0|浏览1
暂无评分
摘要
Not only the vast applications but also the distinct features of time series data stimulate the booming growth of time series database management systems, such as Apache IoTDB, InfluxDB, OpenTSDB and so on. Almost all these systems employ columnar storage, with effective encoding of time series data. Given the distinct features of various time series data, different encoding strategies may perform variously. In this study, we first summarize the features of time series data that may affect encoding performance. We also introduce the latest feature extraction results in these features. Then, we introduce the storage scheme of a typical time series database, Apache IoTDB, prescribing the limits to implementing encoding algorithms in the system. A qualitative analysis of encoding effectiveness is then presented for the studied algorithms. To this end, we develop a benchmark for evaluating encoding algorithms, including a data generator and several real-world datasets. Also, we present an extensive experimental evaluation. Remarkably, a quantitative analysis of encoding effectiveness regarding to data features is conducted in Apache IoTDB. Finally, we recommend the best encoding algorithm for different time series referring to their data features. Machine learning models are trained for the recommendation and evaluated over real-world datasets.
更多
查看译文
关键词
Time series,Data encoding,Data compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要