Author ： Chad R. meiners ,Jignesh Patel ,Eric Norige ,Eric Torng , Alex X. Liu Publisher ：

Fast Regular Expression Matching using Small TCAMs for Networking Intrusion Detection and Prevention Systems Author： Chad R. meiners ,Jignesh Patel ,Eric Norige ,Eric Torng , Alex X. Liu Publisher： 19th USENIX SECURITY SYMPOSIUM Presenter： Zong-Lin Sie Date： 2010/10/13

Introduction • RE matching is a core component of deep packet inspection in modern networking devices. • Propose the first hardware-based RE matching approach that uses TCAMs. • Three novel techniques to reduce TCAM space and improve RE matching speed. (1) transition sharing (2) table consolidation (3) variable striding

Background (1/3) • RE matching algorithms are typically based on the Deterministic Finite Automata(DFA) representation of regular expression. • A DFA is a 5-tuple （Q，Σ，δ，q0，A） Q：states Σ ：alphabet ：start state A ：accept state δ ：ΣxQ → Q (transition function)

Background (2/3) • Fundamental issue ： DFA-based algorithms needs large amount of memory required to store transition table δ . • Directly encoding a DFA using one TCAM entry per transition will lead to a prohibitive amount of TCAM space. • Two techniques that minimize the TCAM space for storing a DFA ： (1) transition sharing (2) table consolidation

Background (3/3) • Second challenge is improving RE matching speed and thus throughput. • By up to a factor of k is to use k-stride DFAs that consume k input characters per transition. • This leads to an exponential increase in both state and transition spaces. →variable striding

Provided approach • Use a TCAM and its associated SRAM to encode the transitions of the DFA built from an RE set where one TCAM entry might encode multiple DFA transitions. • ex ： “01101111,01100011” →0001101111 →01 →0101100011 →10

Transition sharing • Basic idea：combine multiple transitions into a single TCAM entry. • Character Bundling – exploits character redundancy by combing multiple transitions from the same source state to the same destination state into one TCAM entry • Shadow Encoding – use ternary codes in the source state ID field to encode multiple source states.

Character bundling • 4-steps： (1) assign each state a unique ID (2) for each state enumerate all 256 rules (3) minimize the number of transitions (4) concatenate by each rule with its source ID ASCII code： a = 61 (hex) o = 6f (hex)

Shadow encoding • ASCII code： • a = 61 (hex) • b = 62 (hex) • …

Shadow encoding (determine order) • First problem：Find the best order of the state tables in the TCAM given that any order is allowed. • Use some concepts such as default transitions and . • (Delayed input DFAs) is a DFA with default transitions where each state p can have at most one default transition to another state q in the .

Shadow encoding (determine order) • The directed graph consisting of only default transitions is called deferment forest. • We determine the order of state tables in TCAM by constructing a deferment forest and using the partial order defined by the deferment forest. • We say that state p defers to state q , ( ) , if there is a directed graph from p to q in the deferment forest. • We say that state p is in state q’s shadow. • state q ‘s transition table must be placed after the transition tables of all states in state q’s shadow.

Shadow encoding (determine order) • Algorithm for constructing deferment forests： (1) construct a Space Reduction Graph . (2) trim away edges with small weight from SRG. (3) compute a deferment forest by running Kruskal’s algorithm to find a maximum weight spanning forest. (4) for each deferment tree, pick the state that has largest number of transitions going back to itself as the root.

Shadow encoding Original DFA

Shadow encoding(choose transition) • Second problem：Identity entries to remove from each state table given this order. • Construct for given DFA and its deferment forest as follows ： If state p has a default transition to state q, we remove common transitions in both p’s transition table and q’s transition table from p’s transition table.

Shadow encoding(scheme) • Third problem：choose binary IDs and ternary codes for each state support the given order and removed entries.

Table Consolidation • Basic idea：merge multiple transition tables into one transition table. • TCAM entries from 11 to 5

Table Consolidation • Two concepts to define table consolidation： (1) k-decision rule：a rule whose decision is an array of k decisions. (2) k-decision table：a sequence of k-decision rules following the first-match semantics. • Given k-decision table T, if for any rule r in T we delete all the decisions except the i-th decision, we get a 1-decision table, denoted T[i]. • We take a set of k 1-decision tables and construct a k-decision table T that holds for each i. • The process of computing k-decision table called Table Consolidation.

Table Consolidation • First problem：how to consolidate k 1-decision transition tables into a k-decision transition table. • Using two concepts： (1) breakpoint (2) critical range (def ：δ(s,i-1) ≠ δ(s,i) ,i is a breakpoint of state s) b(s)：set of breakpoints for state s b(S) ：set of breakpoints for a set of states S. r(S) ：set of ranges defined by b(S) • When we consolidate s1 and s2 together, we compute b({s1,s2}) and r({s1,s2})

Table Consolidation • Case(1) ：for each ,where r’ is not a deferred range for both s1 and s2, we create a consolidated transition rule where the decision of entry is the ordered pair of decisions for state s1 and s2 on r’. • Case(2) ：for each , where r’ is a deferred range for one but not the other, we fill in r’ in the incomplete transition table where it is deferred, and we create a consolidated transition rule where the decision of entry is the ordered pair of decisions for state s1 and s2 on r’. • Case(3) ： for each ,where r’ is a deferred range for both s1 and s2, we don’t create an entry.

Table Consolidation • Second problem：which 1-decision transition tables should be consolidated together. • A consolidated deferment tree must satisfy the following properties： (1) each node is to be consolidated with at most one node in the second tree (2) a level i node in one tree must be consolidated with a level i node in the second tree

Variable Striding • We improve RE matching throughput by consuming multiple characters per TCAM lookup. • Although k-stride DFAs can speed up RE matching by up to a factor of k, the number of states and transitions can grow exponentially in k. • A k-var-stride DFA consumes between 1 and k characters in each transition with at least one transition consuming k characters. • Each transition is labeled with (1) a unique string of k characters (2) a stride length j(1<j<k) indicating the number of character consumed

Variable Striding • Eliminating State Explosion：By ending any k-var-stride transition path at the first accepting state it reaches. • Controlling Transition Explosion：character bundling and shadow encoding can’t afford the exponential growth of transition. (1) k-var-stride transition sharing (2) self-loop unrolling (for root)

Variable Striding (k-var-stride transition sharing) • We first need to decide on the deferment relationship among states. • The k-var-stride DFA cannot be finalized before we need to compute the deferment relationship among states. (subject to many factors such as TCAM space) • Two approximation options ： (1) 1-stride DFA (2) full k-stride DFA • Thus, we simply use the deferment forest of the 1-stride DFA in computing the transition tables for k-var-stride DFA.

Variable Striding (k-var-stride transition sharing) • For any two states s1 and s2 where s1 defers to s2, we need to compute s1’s k-var-stride transitions that are not shared with s2 because those transitions will constitute s1’s k-var-stride transition table. • A dynamic programming algorithm to compute ： non-shared transitions for a k-stride DFA can be quickly computed from the non-shared transitions of a (k-1)-var-stride DFA

Variable Striding (Self-Loop Unrolling) • Most of root states are self-looping. • Direct expansion → exponential increase in table size. • Self-loop unrolling increases the stride of all the self-loop transitions encoded by the last default TCAM entry.

Experimental results • Only develop the algorithms that would take an RE set and construct the associated TCAM entries • Estimated throughput • TS：transition sharing；TC#：table consolidation

Experimental results • Use their Scale dataset to test scalability ： from 26 REs(DFA size 1275) to 34 REs(DFA size 305339) Build ：per DFA state required to build the non-overlapping sets BW ：per DFA state required to minimize these transition tables

Experimental results • Results on 7-var-stride DFAs group(a)：Bro217,C613 group(b)：C7,C8,C10 (all wildcard closure) group(c)：Snort24,Smort31 Snort34 (40% wildcard closure)

Author ： Chad R. meiners ,Jignesh Patel ,Eric Norige ,Eric Torng , Alex X. Liu Publisher ：