编码(内存)
计算机科学
水准点(测量)
数据挖掘
时间序列
系列(地层学)
发电机(电路理论)
方案(数学)
人工智能
机器学习
数学
古生物学
数学分析
功率(物理)
物理
大地测量学
量子力学
生物
地理
作者
Jinzhao Xiao,Yuxiang Huang,Chengchen Hu,Shaoxu Song,Xiangdong Huang,Jianmin Wang
出处
期刊:Proceedings of the VLDB Endowment
[VLDB Endowment]
日期:2022-06-01
卷期号:15 (10): 2148-2160
被引量:10
标识
DOI:10.14778/3547305.3547319
摘要
Not only the vast applications but also the distinct features of time series data stimulate the booming growth of time series database management systems, such as Apache IoTDB, InfluxDB, OpenTSDB and so on. Almost all these systems employ columnar storage, with effective encoding of time series data. Given the distinct features of various time series data, it is not surprising that different encoding strategies may perform variously. In this study, we first summarize the features of time series data that may affect encoding performance, including scale, delta, repeat and increase. Then, we introduce the storage scheme of a typical time series database, Apache IoTDB, prescribing the limits to implementing encoding algorithms in the system. A qualitative analysis of encoding effectiveness regarding to various data features is then presented for the studied algorithms. To this end, we develop a benchmark for evaluating encoding algorithms, including a data generator regarding the aforesaid data features and several real-world datasets from our industrial partners. Finally, we present an extensive experimental evaluation using the benchmark. Remarkably, a quantitative analysis of encoding effectiveness regarding to various data features is conducted in Apache IoTDB.
科研通智能强力驱动
Strongly Powered by AbleSci AI