计算机科学
解析
判决
代表(政治)
领域(数学分析)
词(群论)
人工智能
数据挖掘
标记数据
编码(集合论)
机器学习
自然语言处理
集合(抽象数据类型)
数学分析
语言学
哲学
数学
政治
政治学
法学
程序设计语言
作者
Shimin Tao,Yilun Liu,Weibin Meng,Zuomin Ren,Hao Yang,Xun Chen,Liang Zhang,Xie Yu-ming,Chang Su,Xiaosong Oiao,Weinan Tian,Yichen Zhu,Tao Han,Ying Qin,Yun Li
标识
DOI:10.1109/iwqos57198.2023.10188759
摘要
Automated log analysis has been widely applied in modern data-center network, performing critical tasks such as log parsing, log anomaly detection and log-based failure prediction. However, existing approaches rely on hand-crafted features or domain-specific vectors to represent logs, which are either laborious in manual efforts or ineffective facing multiple domains in a system. Furthermore, general-purpose word embeddings are not optimized for log data, thus are data-inefficient in handling complex log analysis tasks. In this paper, we present a pre-training phase for language models to understand both in-sentence and cross-sentence features of logs, resulting in a unified representation of logs that is well-suited for various downstream analysis tasks. The pre-training phase is unsupervised, utilizing 0.45 billion logs from 16 diverse domains. Experiments on 12 publicly available evaluation datasets across 3 tasks indicate superiority of our approach against existing approaches, especially in online scenarios with limited historical logs. Our approach also exhibits remarkable few-shot learning ability and domain-adaptiveness, which not only outperforms existing approaches using only 0.0025% of their required training data, but also adapts into new domains via only a few in-domain logs. We release our code and pre-trained model.
科研通智能强力驱动
Strongly Powered by AbleSci AI