R-gram: Inferring message formats of service protocols with relative positional n-grams

Jiaojiao Jiang, Jean-Guy Schneider, Steve Versteeg, Jun Han, MD Arafat Hossain, Chengfei Liu

Research output: Contribution to journalArticleResearchpeer-review

1 Citation (Scopus)

Abstract

Automatically discovering message formats of unknown service or system protocols from network traces has become important for a variety of applications, such as emulating the behavior of an unknown protocol in service virtualization, or enabling deep packet inspection in network security. Among existing schemes, the keyword extraction based approaches have been shown to be effective. Inspired by the template structure of protocol messages, recent works leverage the positions of keywords to extract message keywords more accurately. However, these methods are deficient for messages with large variations in length. To address this problem, we propose R-gram, which exploits the relative positions of keywords in messages, allowing the keywords to be robustly detected in variable length messages. It first extracts the common template of the messages in a given message trace with a fast sampling technique, and segments each message into blocks according to the relative positions of the common keywords in the template. It then identifies message keywords in each block by using a new concept and technique — relative positional n-gram (r-gram in short). Finally, the message keywords are used to separate all the messages into type-specific clusters and consequently derive the message format for each cluster. We have implemented and evaluated R-gram on real-world service traces containing either textual or binary protocol messages. Our experimental results show that R-gram is more accurate and robust than existing state-of-the-art tools in protocol message format extraction. Furthermore, R-gram is efficient for processing large-scale message traces.

Original languageEnglish
Article number103247
Number of pages12
JournalJournal of Network and Computer Applications
Volume196
DOIs
Publication statusPublished - 15 Dec 2021
Externally publishedYes

Keywords

  • Format extraction
  • Protocol messages
  • R-gram
  • Relative positional keywords

Cite this