云计算
可靠性(半导体)
风险分析(工程)
透视图(图形)
铅(地质)
计算机科学
生产(经济)
计算机安全
过程管理
业务
功率(物理)
物理
量子力学
地貌学
人工智能
经济
宏观经济学
地质学
操作系统
作者
Vaibhav Ganatra,Anjaly Parayil,Supriyo Ghosh,Yu Kang,Minghua Ma,Chetan Bansal,Suman Nath,Jonathan Mace
标识
DOI:10.1145/3611643.3613898
摘要
Cloud providers use automated watchdogs or monitors to continuously observe service availability and to proactively report incidents when system performance degrades. Improper monitoring can lead to delays in the detection and mitigation of production incidents, which can be extremely expensive in terms of customer impacts and manual toil from engineering resources. Therefore, a systematic understanding of the pitfalls in current monitoring practices and how they can lead to production incidents is crucial for ensuring continuous reliability of cloud services.
科研通智能强力驱动
Strongly Powered by AbleSci AI