1 / 24

An Improved Algorithm to Accelerate Regular Expression Evaluation

An Improved Algorithm to Accelerate Regular Expression Evaluation. Michela Becchi and Patrick Crowley ANCS 2007. Context. Regular expression matching is a critical operation in networking Intrusion detection Context based billing Peer-to-peer traffic detection and prioritization

donald
Télécharger la présentation

An Improved Algorithm to Accelerate Regular Expression Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Improved Algorithm to Accelerate Regular Expression Evaluation Michela Becchi and Patrick Crowley ANCS 2007

  2. Context • Regular expression matching is a critical operation in networking • Intrusion detection • Context based billing • Peer-to-peer traffic detection and prioritization • Application level filtering • Challenge: perform regular expression matching at line rate, given data-sets of hundreds (or thousands) of patterns • Processing time • Memory requirement (occupancy and bandwidth)

  3. Background & Problem definition • Two algorithmic solutions • Non deterministic finite automata (NFAs) • High time complexity/memory bandwidth requirements • Compact representation • Deterministic finite automata (DFAs) • Low time complexity • Potentially high storage requirement • Multiple implementation approaches • FPGA [Sidhu 2001, Clark 2003, Moscola 2003] • Software [Paxson 1998, Roesch 1999, Tuck 2004] • Custom hardware [Kumar 2006] • Problem: given a DFA, find a representation • compact • allowing an acceptable bound of memory bandwidth requirement/processing time

  4. Background - D2FA s3 s3 a a b s4 b s4 • Observation: • DFAs from practical datasets have redundancy in state transitions • Idea: • default transitions: non-consuming transitions s1 s1 c c s5 s5 s3 a b s4 s2 c s6 s2 c s6 • Implication: • Traversal time / memory bandwidth requirement dependent upon maximum default path length

  5. Background – D2FA construction RegEx: ab+c+, cd+ and bd+e from 1-8 b Space reduction graph a b 3 2 1 2 1 c 3 4 Remaining transitions a 4 c 3 5 c 3 8 3 c 0 d 4 4 4 c d d 0 4 5 4 c 4 d 4 c 7 4 c 4 c 8 4 e b 4 5 4 d 6 7 5 6 4 b d from 3-8

  6. from 1-8 b a b 1 2 c Remaining transitions a c 3 c c d c d d 0 4 5 c d c c c 8 e b d 6 7 b d from 3-8 Background – D2FA construction RegEx: ab+c+, cd+ and bd+e Space reduction graph [4] 3 [4] 1 2 4 3 4 [3] [4] 5 3 8 3 [3] 0 4 4 4 4 4 4 7 4 [2] 4 [4] 4 4 5 4 5 6 [3] 4 [3] Diameter bound=4  removed transitions=33

  7. from 1-8 b a b 1 2 c Remaining transitions a c 3 c c d c d d 0 4 5 c d c c c 8 e b d 6 7 b d from 3-8 Background – D2FA construction RegEx: ab+c+, cd+ and bd+e D2FA Space reduction graph b [4] b 3 [4] 1 2 1 2 4 3 c 4 [3] 3 [4] 5 d a 3 8 3 [3] c 0 4 e 4 d 4 d 4 0 5 4 4 d 4 7 4 b [2] 4 8 [4] 4 e 4 5 4 d 6 7 5 6 [3] 4 [3] Diameter bound=4  removed transitions=33

  8. Background – D2FA construction RegEx: ab+c+, cd+ and bd+e D2FA Space reduction graph from 1-8 b b [4] a b b 3 [4] 1 2 1 2 1 2 c 4 3 Remaining transitions a c 4 [3] 3 [4] c 5 d a 3 3 c 8 3 [3] c 0 c d 4 e 4 d 4 d c d 4 0 5 d 4 0 4 5 4 d c 4 7 4 d b [2] c 4 c 8 4 e c 8 e 4 b 5 4 d 6 7 d 5 6 7 6 [3] 4 [3] b Diameter bound=4 removed transitions=33 d from 3-8 [2] [2] 3 1 2 3 4 2 [2] [2] 4 3 8 5 3 Traversal time=O((D/2+1)N) Time complexity=O(n2logn) Space complexity=O(n2) 0 4 4 4 [1] 4 3 4 3 [2] 7 [1] 4 4 4 4 5 4 4 [2] 5 [2] 6 4 Diameter bound=2 removed transitions=28

  9. Transition redundancy: why? RegEx: ab+c+, cd+ and bd+e from 1-8 b a b 1 2 c Remaining transitions a c 3 c c d c d d 0 4 5 c d c c c 8 e b 6 7 • Forward transitions: • Matches • State specific • Backward transitions: • Mismatch • Shared by multiple states d b from 3-8 d • Idea: • Introduce state depth: minimum distance from entry state • Orient default transitions only backwards (towards decreasing depth) • Pros: • Traversal time O(2N)independent of the maximum default path length • Generality: no need of diameter bound parameter • Cons: • Possible compression loss

  10. from 1-8 b 3 a 1 2 b b 4 3 1 2 c 4 b 2 1 Remaining transitions a 3 8 5 3 c 3 c 0 4 c 4 4 d 4 c 3 a d c d 4 c d 0 4 5 e d 7 4 4 c d 4 d 0 4 5 c c 4 4 d c 5 8 e b b 5 8 6 d 4 e 6 7 d b 6 7 d from 3-8 Our scheme RegEx: ab+c+, cd+ and bd+e Reduced graph Oriented space reduction graph [2] [1] [3] [1] [2] [0] [3] [1] [2] NO diameter bound Time complexity=O(2N) removed transitions=33 • Observations: • Maximum spanning tree on oriented graph: Edmonds and Chu solutions • 2 steps: edge selection and cycle resolution • No cycles: • Space reduction graph not necessary • Simple breath-first traversal algorithm Traversal time=O(2N) Time complexity=O(n2) Space complexity=O(n)

  11. Discussion • Generalization (Jon Turner’s observation) • Allowing default transitions only from depth d to depth ≤d-k, w/ k1, leads to worst case traversal time • Time and space complexity of the construction algorithm still O(n2) and O(n) • Examples: • k=1  traversal O(2N) • k=2  traversal O(1.5N) • k=3  traversal O(1.33N) • k=4  traversal O(1.25N) • Compression • D2FA: • Constraint: diameter bound • Heuristic • Our algorithm: • Constraint: orientation (may be not a problem for RegEx originated DFAs) • Optimal solution

  12. Discussion (cont’d) • Default transitions and depth computation through breath-first traversal • Default transitions can be computed during subset construction, that is, at DFA creation time. • D2FA space complexity can be an issue for big DFAs • O(n2): space reduction graph • Using adjacency list 17B/edge struct wgedge { vertex l,r; // endpoints of the edge weight wt; // edge weight edge lnext; //link to next edge incident to l edge rnext; //link to next edge incident to r } *edges; • Fully connected graph w/ ~11K nodes will require 1GB storage • Possible solutions: partial graphs based on weight • Multiple scans • Effect on algorithm’s execution time

  13. Traversal locality DFA traversal exhibits locality Average traffic tends to mismatch States at low depths tend to be traversed more Backward default transition reiterate the traversal of likely states Discussion (cont’d)

  14. Alphabet reduction • Observation: • Some symbols are treated in the same way over the whole DFA [(s,ci)= (s,cj) for each state sє DFA] • Example: • Ignore case • \n, \r • unused characters • Idea: • Group characters into classes • Mapping filter • Algorithm: • Sequence of clustering operations • Breath-first traversal w/ O(n2) complexity • Applicable at DFA creation time

  15. Evaluation: rule-sets

  16. Evaluation - compression

  17. Evaluation – number of transitions x16 x4 x23 x9.5 x8 x8 x35

  18. Alphabet reduction’s effect

  19. Further decreasing the traversal time

  20. Conclusion • DFA exhibit transition redundancy exploitable through default transitions • D2FA: algorithm trading off compression w/ traversal time • In this work, we propose generic algorithm: • With limited time and space complexity • Allowing O(2N) traversal time (or less) when processing input text • Leading to compression level similar to (or better than) D2FA

  21. Thank you! Questions?

  22. Default transitions targets

  23. Default transitions targets (cont’d)

  24. Our algorithm procedure default_transition (DFAdfa=(n, (states, )), modifies setdefault); listqueue; setdepth[n]; for state sstates  depth[s]=n; default[s]=s; rof depth[0]=0;queue.push(0); while (!queue.empty()) state s= queue.pop(); int saving=0; for char c if (depth[(s,c)]=n)  depth[(s,c)]= depth[s]+1; queue.push((s,c)); fi rof; for (state tstates & depth[t]<depth[s])  intcommon:=# common transitions btw. s and t; if (common > 1 && (common>saving || (common=saving && depth[t]<depth[default[s]]))) saving:=common; default[s]=t; fi rof; end while; end;

More Related