1 / 16

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Advanced Algorithms for Fast and Scalable Deep Packet Inspection. Sailesh Kumar Jonathan Turner John Williams. Why Regular Expressions Acceleration?. RegEx are now widely used Network intrusion detection systems, NIDS Layer 7 switches, load balancing

Télécharger la présentation

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Algorithms for Fast and Scalable Deep Packet Inspection Sailesh Kumar Jonathan Turner John Williams

  2. Why Regular Expressions Acceleration? • RegEx are now widely used • Network intrusion detection systems, NIDS • Layer 7 switches, load balancing • Firewalls, filtering, authentication and monitoring • Content-based traffic management and routing • RegEx matching is expensive • Space: Large amount of memory • Bandwidth: Requires 1+ state traversal per byte • RegEx is performance bottleneck • In enterprise switches from Cisco, etc • Many security appliances • Use DFA, 1+ GB memory, still sub-gigabit throughput • Need to accelerate RegEx!

  3. Can we do better? • Well studied in compiler literature • What’s different in Networking? • Can we do better? • Construction time versus execution time (grep) • Traditionally, (construction + execution) time is the metric • In networking context, execution time is critical • Also, there may be thousands of patterns • DFAs are fast • But can have exponentially large number of states • Algorithms exist to minimize number of states • Still 1) low performance and 2) gigabytes of memory

  4. a 2 c a d a a b a b c c 3 5 1 b b c d b d d 4 c d Delayed Input DFA (D2FA), SIGCOMM’06 • Many transitions • 256 transitions per state • 50+ distinct transitions per state (real world datasets) • Need 50+ words per state • Reduce number of transitions in a DFA Three rules a+, b+c, c*d+ Look at state pairs: there are many common transitions. How to remove them? 4 transitions per state

  5. a a 2 2 c c a a d d a a a a b b a a b b c c c c 3 3 5 5 1 1 b b b b c c d d b b d d d d 4 4 c c d d Delayed Input DFA (D2FA), SIGCOMM’06 • Many transitions • 256 transitions per state • 50+ distinct transitions per state (real world datasets) • Need 50+ words per state • Reduce number of transitions in a DFA Alternative Representation Three rules a+, b+c, c*d+ 4 transitions per state Fewer transitions, less memory

  6. a 2 2 c a d a a b a a b c c c c 3 3 5 5 1 1 b b b c d b d d d 4 4 c d D2FA Operation Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D2FA

  7. D2FA versus DFA • D2FAs are compact but requires multiple memory accesses • Up to 20x increased memory accesses • Not desirable in off-chip architecture • Can D2FAs match the performance of DFAs • YES!!!! • Content Addressed D2FAs (CD2FA) • CD2FAs require only one memory access per byte • Matches the performance of a DFA in cacheless system • Systems with data cache, CD2FA are 2-3x faster • CD2FAs are 10x compact than DFAs

  8. R R all U c cd,R d V a ab,cd,R b Introduction to CD2FA • How to avoid multiple memory accesses of D2FAs? • Avoid lookup to decide if default path needs to be taken • Avoid default path traversal • Solution: Assign labels to each state, labels contain: • Characters for which it has labeled transitions • Information about all of its default states • Characters for which its default states have labeled transitions find node Rat location R Content Labels find node U athash(c,d,R) find node V athash(a,b,hash(c,d,R))

  9. Introduction to CD2FA R R all all Z U c l lm,Z cd,R Y d m pq,lm,Z V a P ab,cd,R X b q hash(p,q,hash(l,m,Z)) hash(c,d,R) a d Input char = hash(a,b,hash(c,d,R)) Current state: V (label = ab,cd,R) → X (label = pq,lm,Z)

  10. Construction of CD2FA • We seek to keep the content labels small • Twin Objectives: • Ensure that states have few labeled transitions • Ensure that default paths are as small as possible • D2FA construction heuristic based upon maximum weight spanning tree creates long default paths • Limit default paths => less space efficient D2FAs • Proposed new heuristic called CRO to construct D2FAs • Runs in 3 phases: Construction, Reduction and Optimization • Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient D2FAs • CD2FAs are constructed from these D2FAs

  11. Memory Mapping in CD2FA R Z R all all U Y c l lm,R cd,R d m pq,lm,R V X a P ab,cd,R b q WE HAVE ASSUMED THAT HASHING IS COLLISION FREE hash(p,q,hash(l,m,Z)) hash(c,d,R)) hash(a,b,hash(c,d,R)) COLLISION

  12. Collision-free Memory Mapping a Four states hash(abc, …) b a b c , …. c 4 memory locations p hash(pqr, …) q p q r , …. r l hash(def, …) hash(mln, …) WE NEED SYSTEMATIC APPRAOCH n , …. l m m n hash(lmn, …) d hash(edf, …) d e f , …. e f

  13. Bipartite Graph Matching • Bipartite Graph • Left nodes are state content labels • Right nodes are memory locations • Map state labels to unique memory locations • An edge for every choice of content label • Perfect matching problem • With n left and right nodes • Need O(logn) random edges • n = 1M implies, we need ~20 edges per node • If we provide slight memory over-provisioning • We can uniquely map state labels with much fewer edges • In our experiments, we found perfect matching without memory over-provisioning

  14. Memory Reduction Results

  15. Throughput Results 3x Faster 4KB cache

  16. Conclusion • We have proposed CD2FAs • Matches/surpasses a DFA in throughput • 10x less memory than table compressed DFA • Novel randomized memory mapping algorithm based upon maximum matching in bipartite graph • Zero space overhead • Zero bandwidth overhead • Thank you and Questions???

More Related