170 likes | 286 Vues
Memory-Efficient Regular Expression Search Using State Merging. Author: Michela Becchi , Srihari Cadambi Publisher: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE Presenter: Ching-Hsuan Shih Date: 2014/04/09.
E N D
Memory-Efficient Regular Expression Search Using State Merging Author: MichelaBecchi, SrihariCadambi Publisher: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE Presenter: Ching-HsuanShih Date: 2014/04/09 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Outline • Introduction • Related Work • State Merging: A Motivational Example • State Merging in DFAs • Bitmap-based Data Structures for DFAs • Experimental Results National Cheng Kung University CSIE Computer & Internet Architecture Lab
Introduction (1/2) • Network Intrusion Detection System (NIDS) • Is a device or software to monitor the network whether there are malicious activities. • Most IDS is to observe the network packet ,system log or network flow. • Regular Expression • Current rule-sets like Snort, Bro, and many others are replacing strings with the more powerful and expressive regular expressions. National Cheng Kung University CSIE Computer & Internet Architecture Lab
Introduction (2/2) • The classical method to perform regular expression search is to use a deterministic finite automaton (DFA). • The main problem with DFAs is prohibitive memory usage: • The number of states in a DFA scale poorly with the size and number of wildcards in the regular expressions they represent. • We propose a novel technique that allows non-equivalent states in a DFA to be merged using a scheme where the transitions in the DFA are labeled. National Cheng Kung University CSIE Computer & Internet Architecture Lab
Related Work • Delayed DFA (D2FA) [6]: • It identifies two (or more) states that transition to the same set of destinations on the same input characters. • D2FA achieves memory compaction by removing duplicated transitions, but this happens at the expense of latency. • States with a default transition require more than one transition per input character. • In [14]: • The authors propose increasing the speed of regular expression search by expanding the alphabet. • They process two characters (bytes) for every state transition in the DFA. • This produces an exponential increase in memory usage. National Cheng Kung University CSIE Computer & Internet Architecture Lab
State Merging: A Motivational Example(1/4) National Cheng Kung University CSIE Computer & Internet Architecture Lab
State Merging: A Motivational Example (2/4) • The merged state is represented as 3_4 • The transition [g-i]/0, j/1 indicates that the same next state, in this case state 5, is reached from state 3_4 upon receiving input characters g, h, i with label 0 or input character j with label 1. National Cheng Kung University CSIE Computer & Internet Architecture Lab
State Merging: A Motivational Example (3/4) National Cheng Kung University CSIE Computer & Internet Architecture Lab
State Merging: A Motivational Example (4/4) • The merged state is represented as 1_2 • The transition a.0/0,1 from state 3_4 to state 1_2 means: • The transition carries with it a label 0 that tells its destination state, 1_2 that the transition is meant for underlying original state 1. • The transition is taken when its source state 3_4 receives labels 0 or 1. National Cheng Kung University CSIE Computer & Internet Architecture Lab
State Mergingin DFAs (1/3) A. Labels • For every transition connecting two merged states, we define source labels and destination labels, ex. c.ld/l0, l1… B. Legality of State Merging National Cheng Kung University CSIE Computer & Internet Architecture Lab
State Mergingin DFAs (2/3) C. Merging and Labeling Algorithm National Cheng Kung University CSIE Computer & Internet Architecture Lab
State Mergingin DFAs (3/3) National Cheng Kung University CSIE Computer & Internet Architecture Lab
Bitmap-based Data Structure for DFAs (1/3) • Basic: National Cheng Kung University CSIE Computer & Internet Architecture Lab
Bitmap-based Data Structure for DFAs (2/3) • Bitmap-based: National Cheng Kung University CSIE Computer & Internet Architecture Lab
Bitmap-based Data Structure for DFAs (3/3) • Bitmap-based merged data structure: National Cheng Kung University CSIE Computer & Internet Architecture Lab
Experimental Results (1/2) • Note that the Snort rule-sets have lower percentages of distinct next state transitions than the Bro rule-sets. This is due to the large number of character ranges (both in the form [c1-c2] and \d, \D, \w, \W, \s, \S) and to the fact that Snort regular expressions are not case sensitive. National Cheng Kung University CSIE Computer & Internet Architecture Lab 16
Experimental Results (2/2) • The width of the transition table is set to 32 bits. National Cheng Kung University CSIE Computer & Internet Architecture Lab 17