计算机科学
调度(生产过程)
云计算
效率低下
虚拟机
分布式计算
操作系统
工程类
运营管理
经济
微观经济学
作者
Wang Liang,Jinzhe Yang,Jidong Zhai,Guangwen Yang
出处
期刊:IEEE Transactions on Parallel and Distributed Systems
[Institute of Electrical and Electronics Engineers]
日期:2023-11-01
卷期号:35 (12): 2315-2330
标识
DOI:10.1109/tpds.2023.3329298
摘要
Virtual machines (VMs) heavily rely on virtual CPUs (vCPUs) scheduling to achieve efficient I/O performance. The vCPU scheduling interference can cause inconsistent scheduling latency and degraded I/O performance, potentially compromising the services provided by affected VMs. Existing solutions have limitations, such as inefficiency in diagnosing interference issues or imposing undesired side effects on cloud systems. To address these challenges, we present Otter, a holistic technique for optimizing I/O performance in the presence of vCPU scheduling interference. Otter employs innovative methods to enhance interference diagnosis efficiency. First, we propose lightweight methods to measure the dynamic changes in scheduling latencies for co-running vCPUs, ensuring both flexibility and accuracy. Second, we propose fine-grained quantification methods to timely determine the interference, with low false positive and false negative rates. Third, we identify interference patterns that aid in analyzing the root causes of interference and preventing similar issues from recurring. Otter has been operational for one year in the production cloud at the National Supercomputing Center (Wuxi). It diagnoses and helps fix more than 470 vCPU scheduling interference-related issues, resulting in a 19.6% improvement in cloud service I/O performance with negligible overhead in production.
科研通智能强力驱动
Strongly Powered by AbleSci AI