计算机科学
自然语言理解
自然语言处理
术语
语言模型
杠杆(统计)
利用
文档
人工智能
自然语言
数据科学
机器学习
哲学
语言学
计算机安全
程序设计语言
作者
Lipeng Ma,Weidong Yang,Bo Xu,Sihang Jiang,Ben Fei,Jiaqing Liang,M. Zhou,Yanghua Xiao
标识
DOI:10.1145/3597503.3623304
摘要
Logs as semi-structured text are rich in semantic information, making their comprehensive understanding crucial for automated log analysis. With the recent success of pre-trained language models in natural language processing, many studies have leveraged these models to understand logs. Despite their successes, existing pre-trained language models still suffer from three weaknesses. Firstly, these models fail to understand domain-specific terminology, especially abbreviations. Secondly, these models struggle to adequately capture the complete log context information. Thirdly, these models have difficulty in obtaining universal representations of different styles of the same logs. To address these challenges, we introduce KnowLog, a knowledge-enhanced pre-trained language model for log understanding. Specifically, to solve the previous two challenges, we exploit abbreviations and natural language descriptions of logs from public documentation as local and global knowledge, respectively, and leverage this knowledge by designing novel pre-training tasks for enhancing the model. To solve the last challenge, we design a contrastive learning-based pre-training task to obtain universal representations. We evaluate KnowLog by fine-tuning it on six different log understanding tasks. Extensive experiments demonstrate that KnowLog significantly enhances log understanding and achieves state-of-the-art results compared to existing pre-trained language models without knowledge enhancement. Moreover, we conduct additional experiments in transfer learning and low-resource scenarios, showcasing the substantial advantages of KnowLog. Our source code and detailed experimental data are available at https://github.com/LeaperOvO/KnowLog.
科研通智能强力驱动
Strongly Powered by AbleSci AI