瓶颈
计算机科学
工作量
云计算
作业调度程序
批处理
调度(生产过程)
齐普夫定律
操作系统
星团(航天器)
闲置
数据中心
分布式计算
数据库
实时计算
嵌入式系统
运营管理
经济
统计
数学
作者
Congfeng Jiang,Yitao Qiu,Weisong Shi,Zhefeng Ge,Jiwei Wang,Shenglei Chen,Christophe Cérin,Zujie Ren,Guoyao Xu,Jiangbin Lin
出处
期刊:IEEE Transactions on Cloud Computing
[Institute of Electrical and Electronics Engineers]
日期:2020-10-28
卷期号:10 (4): 2381-2397
被引量:25
标识
DOI:10.1109/tcc.2020.3034500
摘要
Workload characteristics are vital for both data center operation and job scheduling in co-located data centers, where online services and batch jobs are deployed on the same production cluster. In this article, a comprehensive analysis is conducted on Alibaba's cluster-trace-v2018 of a production cluster of 4034 machines. The findings and insights are the following: (1) The workload on the production cluster poses a daily cyclical fluctuation, in terms of CPU and disk I/O utilization, and the memory system has become the performance bottleneck of a co-located cluster. (2) Batch jobs including their tasks and derived instances can be approximated as Zipf distribution. However, for all batch jobs with directed acyclic graph dependency, they suffer from co-location with online services since the online services are highly prioritized. (3) The resource usages of containers have similar cyclical fluctuation consistent with the whole cluster, while their memory usages remain approximately constant. (4) The number of batch jobs co-located with online services is dependent on the mispredictions per kilo instructions of online services. In order to guarantee the QoS of online services, when the MPKI of online services rises, the number of batch jobs to be co-located on the same machine should decrease.
科研通智能强力驱动
Strongly Powered by AbleSci AI