330 likes | 437 Vues
This study addresses Internet service degradation, characterized by deviations from normal RTT (Round-Trip Time) levels. It explores various predictors to anticipate degradation, evaluates performance through precision/recall methodologies, and discusses intelligent routing applications for gateway selection. The work utilizes data from diverse sources to simulate scenarios and develop accurate prediction models, including exponential decay and Hidden Markov Models. The objective is to enhance network reliability by rerouting traffic effectively during deterioration events.
E N D
Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University
Outline: • Degradation • deviation from “normal” (minimum) RTT. • Predicting Degradation: • Different Predictors • Performance Evaluation: • Precision/recall methodology • Suggested Application: Gateway selection
Intelligent Routing device ? Motivating Application AS 41 AS 123 Peering link AS 56 Peering link AS 12 • Gateway selection (Intelligent Routing device) • Choosing peering links
Data and Measurements: Sources • Aciri (CA2) • AT&T (CA1) • AT&T(NJ1) • Princeton (NJ2) • Base Measurements from 4 different location (AS) simulated 4 • gateway: • California (CA): AT&T + ACIRI • New Jersey (NJ): AT&T + Princeton
Data and Measurements: Destinations • Aciri (CA2) • AT&T(CA1) • AT&T(NJ1) • Princeton (NJ2) • Obtaining a representative sets of web servers + weights • (derived from proxy-log)
Data and Measurements: RTT • Aciri (CA2) • AT&T(CA1) • AT&T(NJ1) • Princeton(NJ2) • Data: Weekly RTT (SYN) ( End to End (path+server)) • Hourly measurements 35,124 servers • Once-a-minute weighted sample measurements 100 servers
Degradation: Definition • Deviation from minimum recorded RTT (propagation delay) • Discrete degradation levels 1-6.
Objective: Avoiding degradation ? • Attempt to reroute through a different gateway • Two conditions have to hold • Need to be able to predict the failure from a gateway • Need to have a substitute gateway (low correlation between gateways) • Blackout (consecutive degradation) through one gateway
Blackout durations • Longer duration, easier to predict. • Majority of blackouts are short 1-3 consecutive points • However, considerable fraction occurs in longer durations. Long duration blackout
Gateways Correlation • Gateways are correlated but often the correlation is not too strong
Gateways Correlation • Longer blackouts more likely to be shared • failure closer to the server • Majority of 2-gateways blackouts involved same-coast pairs
Building predictors • For a given degradation level l. • Prediction per IP. • Input: Previous RTT Measurements for the IP-address. • Output: probability for a failure • Predict “failure” if probability > Ф
Actual degraded & Predicted Degraded Actual degraded & Predicted Degraded Precision = Recall = Predicted degraded Actual degraded Precision \ Recall Methodology Predicted degraded Actual degraded
Precision-recall curve • Sweep the threshold Ф in [0,1] to obtain a precision-recall curve. • In other words, let P(t) the predicted failure probability at time t
What is important for prediction? • Recency principle • The more recent RTTs are more important. • Quantity Principle • The more measurements the higher the accuracy.
Recency Principle : Importance • Test case: Single measurement predictor • predict according to a measurement x-minute ago. • observe the change in the quality of the prediction. 15% different between using the last minute measurement or the 15 minutes ago measurement
Quantity Principle: Importance • Test case: Fixed-Window-Count(FWC) • the prediction is the fraction of failures in the W most recent measurements By quantity we can achieve better precision for high recall FWC 1 FWC 5 FWC 10 FWC 50
Our predictors • Exponential Decay • Polynomial Decay • Model based Predictors: • VW-cover : Variable Window Cover algorithm • HMM : Hidden Markov Model
Exponential-decay predictors • The weight of each measurement is exponentially decreasing with its age by factor λ. For consecutive measurements: • Binary variable ft represents a failure at time t. • In general,
Polynomial-decay predictors • Exact computation required to maintaining the complete history. • We approximated it.
The VW-Cover predictor • Consists of a list of pairs ( a1 , b1) ( a2 , b2 ) …( an , bn ) • Predict a failure if exist i such that there are at least bi failures among previous ai measurements
VW-Cover predictor: Building • Build the predictor greedily to cover the failures. • Use a learning set of measurements • Pick ( a1 , b1 ) to be the pair which maximizes precision • Pick ( ai , bi ) to be the pair which maximizes precision among uncovered failures
Hidden Markov Model • Finite set states S (we use 3 states) • Output probability as(0),as(1) • Transition function, determines the probability distribution of the next state. • The probability for a failure: Where ps(t) is the probability to be at state s at time t. Ps(t) is updated according to the output of time t-1.
Predictor Performance – Level 3 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM A recall 0.5 precision close to 0.9
Predictor Performance – Level 6 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM • Degradation of level-6 are harder to predict: recall 0.5 precision 0.4
Predictor Performance: Conclusion • The best predictors in level 3 and 6 are VW-cover and HMM • But they only slightly outperform ExpDecay0.95 which is considerable simpler to implement
Gateway Selection Level 6 Level 3
Gateway Selection: Conclusion • Active gateway selection resulted in 50% reduction in the degradation-rate with respect to best single gateway. • Static gateway selection can avoid at most 25% of degradations. • Again ExpDecay0.95 only slightly under perform the best predictor (VW-cover).
Correlation between coast • Gateway selection on same-coast pair resulted only in 10% reduction. • Chose independent gateways
Controlling prediction overhead • Type of measurements: • Active measurements : • initiate probes (SYN,ping,HTTP request). • Scalability problem. • Passive measurements: • collected on regular traffic • Controlling the prediction overhead: • Using less-recent measurements • Active measurements only to small set of destinations, which cover the majority of traffic. • Cluster destinations. The measurements of one destination can be used to predict another.
Questions ?? natali@cs.tau.ac.il edith@research.att.com haimk@cs.tau.ac.il mansour@cs.tau.ac.il