1 / 33

Predicting and Bypassing End-to-End Internet Service Degradation

Predicting and Bypassing End-to-End Internet Service Degradation. Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay Mansour Tel-Aviv University AT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University. Outline:. Degradation

Télécharger la présentation

Predicting and Bypassing End-to-End Internet Service Degradation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University

  2. Outline: • Degradation • deviation from “normal” (minimum) RTT. • Predicting Degradation: • Different Predictors • Performance Evaluation: • Precision/recall methodology • Suggested Application: Gateway selection

  3. Intelligent Routing device ? Motivating Application AS 41 AS 123 Peering link AS 56 Peering link AS 12 • Gateway selection (Intelligent Routing device) • Choosing peering links

  4. Data and Measurements: Sources • Aciri (CA2) • AT&T (CA1) • AT&T(NJ1) • Princeton (NJ2) • Base Measurements from 4 different location (AS) simulated 4 • gateway: • California (CA): AT&T + ACIRI • New Jersey (NJ): AT&T + Princeton

  5. Data and Measurements: Destinations • Aciri (CA2) • AT&T(CA1) • AT&T(NJ1) • Princeton (NJ2) • Obtaining a representative sets of web servers + weights • (derived from proxy-log)

  6. Data and Measurements: RTT • Aciri (CA2) • AT&T(CA1) • AT&T(NJ1) • Princeton(NJ2) • Data: Weekly RTT (SYN) ( End to End (path+server)) • Hourly measurements  35,124 servers • Once-a-minute weighted sample measurements  100 servers

  7. Degradation: Definition • Deviation from minimum recorded RTT (propagation delay) • Discrete degradation levels 1-6.

  8. Objective: Avoiding degradation ? • Attempt to reroute through a different gateway • Two conditions have to hold • Need to be able to predict the failure from a gateway • Need to have a substitute gateway (low correlation between gateways) • Blackout (consecutive degradation) through one gateway

  9. Blackout durations • Longer duration, easier to predict. • Majority of blackouts are short 1-3 consecutive points • However, considerable fraction occurs in longer durations. Long duration blackout

  10. Gateways Correlation • Gateways are correlated but often the correlation is not too strong

  11. Gateways Correlation • Longer blackouts more likely to be shared • failure closer to the server • Majority of 2-gateways blackouts involved same-coast pairs

  12. Building predictors • For a given degradation level l. • Prediction per IP. • Input: Previous RTT Measurements for the IP-address. • Output: probability for a failure • Predict “failure” if probability > Ф

  13. Actual degraded & Predicted Degraded Actual degraded & Predicted Degraded Precision = Recall = Predicted degraded Actual degraded Precision \ Recall Methodology Predicted degraded Actual degraded

  14. Precision-recall curve • Sweep the threshold Ф in [0,1] to obtain a precision-recall curve. • In other words, let P(t) the predicted failure probability at time t

  15. What is important for prediction? • Recency principle • The more recent RTTs are more important. • Quantity Principle • The more measurements the higher the accuracy.

  16. Recency Principle : Importance • Test case: Single measurement predictor • predict according to a measurement x-minute ago. • observe the change in the quality of the prediction.  15% different between using the last minute measurement or the 15 minutes ago measurement

  17. Quantity Principle: Importance • Test case: Fixed-Window-Count(FWC) • the prediction is the fraction of failures in the W most recent measurements  By quantity we can achieve better precision for high recall FWC 1 FWC 5 FWC 10 FWC 50

  18. Our predictors • Exponential Decay • Polynomial Decay • Model based Predictors: • VW-cover : Variable Window Cover algorithm • HMM : Hidden Markov Model

  19. Exponential-decay predictors • The weight of each measurement is exponentially decreasing with its age by factor λ. For consecutive measurements: • Binary variable ft represents a failure at time t. • In general,

  20. Polynomial-decay predictors • Exact computation required to maintaining the complete history. • We approximated it.

  21. The VW-Cover predictor • Consists of a list of pairs ( a1 , b1) ( a2 , b2 ) …( an , bn ) • Predict a failure if exist i such that there are at least bi failures among previous ai measurements

  22. VW-Cover predictor: Building • Build the predictor greedily to cover the failures. • Use a learning set of measurements • Pick ( a1 , b1 ) to be the pair which maximizes precision • Pick ( ai , bi ) to be the pair which maximizes precision among uncovered failures

  23. Hidden Markov Model • Finite set states S (we use 3 states) • Output probability as(0),as(1) • Transition function, determines the probability distribution of the next state. • The probability for a failure: Where ps(t) is the probability to be at state s at time t. Ps(t) is updated according to the output of time t-1.

  24. Experimental Evaluation

  25. Predictor Performance – Level 3 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM  A recall 0.5 precision close to 0.9

  26. Predictor Performance – Level 6 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM • Degradation of level-6 are harder to predict: recall 0.5 precision 0.4

  27. Predictor Performance: Conclusion • The best predictors in level 3 and 6 are VW-cover and HMM • But they only slightly outperform ExpDecay0.95 which is considerable simpler to implement

  28. Gateway Selection Level 6 Level 3

  29. Gateway Selection: Conclusion • Active gateway selection resulted in 50% reduction in the degradation-rate with respect to best single gateway. • Static gateway selection can avoid at most 25% of degradations. • Again ExpDecay0.95 only slightly under perform the best predictor (VW-cover).

  30. Performance of gateway selection as a function of recency

  31. Correlation between coast • Gateway selection on same-coast pair resulted only in 10% reduction. • Chose independent gateways

  32. Controlling prediction overhead • Type of measurements: • Active measurements : • initiate probes (SYN,ping,HTTP request). • Scalability problem. • Passive measurements: • collected on regular traffic • Controlling the prediction overhead: • Using less-recent measurements • Active measurements only to small set of destinations, which cover the majority of traffic. • Cluster destinations. The measurements of one destination can be used to predict another.

  33. Questions ?? natali@cs.tau.ac.il edith@research.att.com haimk@cs.tau.ac.il mansour@cs.tau.ac.il

More Related