计算机科学
纳克
计算机网络
服务(商务)
自然语言处理
经济
语言模型
经济
作者
Jiaojiao Jiang,Jean-Guy Schneider,Steve Versteeg,Jun Han,Arafat Hossain,C.T. Liu
标识
DOI:10.1016/j.jnca.2021.103247
摘要
Automatically discovering message formats of unknown service or system protocols from network traces has become important for a variety of applications, such as emulating the behavior of an unknown protocol in service virtualization, or enabling deep packet inspection in network security. Among existing schemes, the keyword extraction based approaches have been shown to be effective. Inspired by the template structure of protocol messages, recent works leverage the positions of keywords to extract message keywords more accurately. However, these methods are deficient for messages with large variations in length. To address this problem, we propose R-gram , which exploits the relative positions of keywords in messages, allowing the keywords to be robustly detected in variable length messages. It first extracts the common template of the messages in a given message trace with a fast sampling technique, and segments each message into blocks according to the relative positions of the common keywords in the template. It then identifies message keywords in each block by using a new concept and technique — r elative positional n -gram ( r -gram in short). Finally, the message keywords are used to separate all the messages into type-specific clusters and consequently derive the message format for each cluster. We have implemented and evaluated R-gram on real-world service traces containing either textual or binary protocol messages. Our experimental results show that R-gram is more accurate and robust than existing state-of-the-art tools in protocol message format extraction. Furthermore, R-gram is efficient for processing large-scale message traces.
科研通智能强力驱动
Strongly Powered by AbleSci AI