1 / 37

Reverse Hashing for Sketch Based Change Detection in High Speed Networks

Reverse Hashing for Sketch Based Change Detection in High Speed Networks. Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen Class Presentation, June 2004, Network Security Computer Science Department, Northwestern University. Overview. Anomaly Detection

sonya-berg
Télécharger la présentation

Reverse Hashing for Sketch Based Change Detection in High Speed Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish GuptaElliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen Class Presentation, June 2004, Network Security Computer Science Department, Northwestern University

  2. Overview • Anomaly Detection • Sketch Based Approaches and their problems • Reverse Hashing algorithms • Dealing with Multiple Anomalies • Evaluation • Conclusions • Future Work

  3. Overview • Anomaly Detection • Sketch Based Approaches and their problems • Reverse Hashing algorithms • Dealing with Multiple Anomalies • Evaluation • Conclusions • Future Work

  4. Anomaly Detection • Goes beyond signature detection • Two popular types: • Heavy Hitter Detection • Change detection : very broad  simple change to statistical methods • Online real-time  difficult • Heavy hitter: some solutions proposed • Heavy Change  ? • Scalability with High speed traffic • Large Number of flows: large memory required • Performance penalty • Scalable Change Detection: Sketch to the rescue !

  5. Overview • Anomaly Detection • Sketch Based Approaches and their problems • Reverse Hashing algorithms • Dealing with Multiple Anomalies • Evaluation • Conclusions • Future Work

  6. What is a sketch ? • Probabilistic summary of data streams • Widely used in database research to handle massive data streams • Array of hash tables: Tj[K] (j = 1, …, H)

  7. h1(k) 0 1 K-1 Estimate v(S, k): sum of updates for key k 1 … hj(k) j hH(k) … H What is a sketch ? Update (k, u): Tj [ hj(k)] += u (for all j)

  8. Using Sketch for anomaly detection • Requires very little space: • E.g. 5 hash tables with 16 K buckets = 360 K • High speed memory usable • Still able to reconstruct the values with high accuracy • Its main problem • To know the value of a key, must know the key. • Can know the anomalies, not the keys !

  9. ? ? Using Sketch for anomaly detection • Requires very little space: • E.g. 5 hash tables with 16 K buckets = 360 K • High speed memory usable • Still able to reconstruct the values with high accuracy • Its main problem • To know the value of a key, must know the key. • Can know the anomalies, not the keys !

  10. Overview • Anomaly Detection • Sketch Based Approaches and their problems • Reverse Hashing algorithms • Dealing with Multiple Anomalies • Evaluation • Conclusions • Future Work

  11. How can we figure out the keys without storing them explicitly ? ? ? Our contribution

  12. Step 1: Taking Intersections • Each hash table  independent hash function • Each key maps to different bucket in each table • Each bucket maps to a large set of keys • Example: Key maps to b1, b2, b3, b4, b5 • Intersect A1, A2, A3, A4, A5  really small set ! • E[x] << 1 for 5 hash tables (ref. our paper )

  13. The problem with simple intersection • Why is this difficult ? • One to many mapping • Each set Ai can be very large ! • E.g. for IP addresses Key space is 232. For 212 buckets  220 keys per bucket !

  14. Modular hashing 32 bits 10010100 10101011 10010101 10100011 8 bits Problem with Intersections • How do we store these huge mappings ? • How de we take intersections of these huge sets ? • Partition the key into separate words • Hash each word separately

  15. h1() h2() h3() h4() 010 110 001 101 010 110 001 101 Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits Greatly reduces size of reverse mapped sets

  16. Modular hashing 28/23 Only 32 elements per partition • For 8 bit to 3 bit hashing : Each bucket maps to 25 = 32 keys  small !

  17. logarithmic in key space • poly-log in key space Modular Hashing is Efficient • Very efficient in space and time: • If n is the key space, m is hash space, q is number of words, • Space = • Run time (intersections) = Set q = O(log n)

  18. An Important problem: spatial locality • This hashing scheme is not uniform and biased • In network streams, strong spatial locality in IP addresses • E.g. many addresses fall into 120.105.56.* • These would be mapped into very few buckets  large number of collisions  low sketch accuracy IP Mangling

  19. Without IP mangling: skewed !

  20. IP Mangling removes correlations • Key idea : randomize the input data to destroy correlations • Must be reversible also !

  21. To be invertible: Must be relatively prime Theory of Modular Linear Equations • a is chosen randomly • Can be easily reversed: replace a by a-1 ! • This function is highly effective in resolving the skewed distribution f(x)  a·x mod n

  22. With IP mangling: uniform !

  23. Modular Hashing Makes intersection time and space efficient IP Mangling Removes un-uniformity of modular hashing Recap Intersections of reverse mapped sets Converges to culprit key

  24. Overview • Anomaly Detection • Sketch Based Approaches and their problems • Reverse Hashing algorithms • Dealing with Multiple Anomalies • Evaluation • Conclusions • Future Work

  25. Handling Multiple Intersections… • A more complex problem  Illustration How do we take intersections now ? • Each hash table contains two anomalies now  two culprit keys…

  26. Handling Multiple Intersections… • Multiple possibilities…. • Take union of keys from each hash table, and then intersection False positives

  27. Handling Multiple Intersections… • Multiple possibilities…. • Try all possible combinations of intersections…. • Expensive and inaccurate(?)

  28. Handling Multiple Intersections… • Bucket Vector Algorithm: a new algo • Efficient • Similar to all possible intersections but takes polynomial time • Documented in our technical report

  29. Overview • Anomaly Detection • Sketch Based Approaches and their problems • Reverse Hashing algorithms • Dealing with Multiple Anomalies • Evaluation • Conclusions • Future Work

  30. Evaluation • Got traffic traces from a large ISP • Each 5 min interval  7.5 GB of traces • Used the Change Detection Method described earlier

  31. Evaluation • Efficacy depends on number of heavy changers • Depends on change threshold, • Less threshold  large number of heavy changes • To verify our results, used a naïve multi-pass algo  the Ground Truth

  32. Our methods are quite effective • Detection quite accurate, even upto 20 heavy changes • False positives and false negatives very less

  33. The bucket vector algorithm is important • For multiple changes, the method of intersection quite important • E.g. w/o bucket vector algorithm:

  34. We can make the sketch more accurate • Use 6 hash tables , instead of 5 • Makes intersections very accurate, less false negatives

  35. Conclusions • Sketches a powerful method for scalable change detection • Our main contribution : can reverse them • Greatly enhances their applicability in online systems • We can extract heavy changes from the sketchs, without storing any key information • Methods are accurate • Low number of false positives and false negatives • Methods are efficient • Runtime: Only poly-logarithmic in key space • Space: logarithmic in key space

  36. Overview • Anomaly Detection • Sketch Based Approaches and their problems • Reverse Hashing algorithms • Dealing with Multiple Anomalies • Evaluation • Conclusions • Future Work

  37. Future Work: Three areas • Application to Online real-time systems • Performance evaluation • Hardware design of our methods • More advanced applications: • Hierarchical change detection • Output the prefix changes not just the key changes ! • E.g. 129.105.100.* shows a big change ! • Advanced change detection methods: • Statistical methods

More Related