微服务
可扩展性
根本原因分析
跟踪(心理语言学)
根本原因
水准点(测量)
计算机科学
词根(语言学)
软件部署
排名(信息检索)
集合(抽象数据类型)
人工智能
分布式计算
数据挖掘
软件工程
操作系统
可靠性工程
工程类
哲学
云计算
语言学
程序设计语言
地理
大地测量学
作者
Zeyan Li,Junjie Chen,Rui Jiao,Ning Zhao,Zhijun Wang,Shuwei Zhang,Yicheng Wu,Jiang Long,Leiqin Yan,Zikai Wang,Zhekang Chen,Wenchi Zhang,Xin Nie,Kaixin Sui,Dan Pei
出处
期刊:International Workshop on Quality of Service
日期:2021-06-25
被引量:32
标识
DOI:10.1109/iwqos52092.2021.9521340
摘要
Microservice architecture is applied by an increasing number of systems because of its benefits on delivery, scalability, and autonomy. It is essential but challenging to localize root-cause microservices promptly when a fault occurs. Traces are helpful for root-cause microservice localization, and thus many recent approaches utilize them. However, these approaches are less practical due to relying on supervision or other unrealistic assumptions. To overcome their limitations, we propose a more practical root-cause microservice localization approach named TraceRCA. The key insight of TraceRCA is that a microservice with more abnormal and less normal traces passing through it is more likely to be the root cause. Based on it, TraceRCA is composed of trace anomaly detection, suspicious microservice set mining and microservice ranking. We conducted experiments on hundreds of injected faults in a widely-used open-source microservice benchmark and a production system. The results show that TraceRCA is effective in various situations. The top-1 accuracy of TraceRCA outperforms the state-of-the-art unsupervised approaches by 44.8%. Besides, TraceRCA is applied in a large commercial bank, and it helps operators localize root causes for real-world faults accurately and efficiently. We also share some lessons learned from our real-world deployment.
科研通智能强力驱动
Strongly Powered by AbleSci AI