计算机科学
元数据
内存占用
架空(工程)
吞吐量
数据库
数据挖掘
分布式计算
操作系统
无线
作者
Xuanhua Shi,Zezhao Feng,Kaixi Li,Yongluan Zhou,Hai Jin,Yao Jiang,Bingsheng He,Zhijun Ling,Xin Li
标识
DOI:10.1145/3419111.3421289
摘要
Monitoring large-scale and complex systems often generates high-dimensional and highly dynamic time series data. In such a scenario, massive metadata has to be maintained to support efficient querying, whose large footprint poses great challenges to in-memory databases. In this paper, we present ByteSeries, an in-memory time series database that is designed specifically for large-scale monitoring systems to manage high-dimensional time series. We start with an analysis of the production data and workload at ByteDance's metric monitoring system, which contains over 10 billion time series dimensions. The observation of high overhead of metadata management in high-dimensional time series data calls for a rethink of time series database systems. Byte-Series's memory structure employs the novel Compressed Inverted Index to effectively compress metadata while maintaining high efficiency for multi-dimensional queries. In addition, an algorithm is proposed to effectively convert data into compressed form without sacrificing the data ingestion throughput. We experimentally evaluate ByteSeries by comparing it with ByteDance's original production system, tsdc, as well as two open-source systems, namely Gorilla and Prometheus. We show that ByteSeries significantly improves over ByteDance's original production system by 1) reducing the memory footprint of metadata by 60% and the whole memory consumption by 50%, and 2) speeding up multi-dimensional queries by 1.8x-10.7x.
科研通智能强力驱动
Strongly Powered by AbleSci AI