200 likes | 267 Vues
On Understanding of Transient Interdomain Routing Failures. Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu. Department of Electrical and Computer Engineering University of Massachusetts, Amherst MA 01002. AT&T Labs-research 180 Park Ave, Florham Park NJ 07869. Outline.
E N D
On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering University of Massachusetts, Amherst MA 01002 AT&T Labs-research 180 Park Ave, Florham Park NJ 07869
Outline • What is transient routing failures? • When can transient routing failures occur? • How long can transient routing failures last? • Measurement results
Internet Routing • Autonomous systems (ASes) • Internet Service Providers (ISPs) • Companies • Universities • Intradomain Routing Protocols • Static Routing, OSPF, IS-IS • Interdomain Routing Protocol • Border Gateway Protocol (BGP)
Long Convergence Delay • Long convergence delay (Labovitz et al, TON2001) • Bringing a route back • (Tup): <shortest path length MRAI • Disconnecting a route • (Tdown): <longest path length MRAI • Fail-over: rerouting from Path A to Path B • During the time for discovering Path B, routers might experience transient routing failures, i.e., no route is available
An Example of Transient Routing Failure AS3 AS1 W:20 W:20 W:20 AS2 120 10 10 20 210 A:10 A:10 A:10 losing reachability Traffic on data plane AS0 BGP update d BGP Routing table
Our Contributions • Identify transient routing failures • Sufficient conditions • Bound transient routing failure duration
Outline • What is transient routing failures? • When can transient routing failures occur? • How long can transient routing failures last? • Measurement results
When Transient Routing Failures can Occur? • Two sufficient conditions for a node must experience a transient routing failure (transient routing failure for sure). • One sufficient condition for a node may experience a transient routing failure (potential transient routing failure). w 10 310 1 3 w 2 210 20 20 0
When Transient Routing Failures can Occur? (contd.) w 310 320 320 10 310 1 3 w A 2 210 20 20 0
Outline • What is transient routing failures? • When can transient routing failures occur? • How long can transient routing failures last? • Measurement results
How long Transient Routing Failures last? MRAI timer MRAI timer W: 2 0 W: 2 0 W: 2 0 10 120 10 210 10 1 2 A: 10 A: 10 A: 10 0 d
MRAI Timers • Minimum Advertisement Interval timer • Minimum amount of time that must elapse between routing updates • Applied to BGP announcement or withdrawal • Default MRAI value • eBGP session: 30 seconds • iBGP session: 5 seconds
Upper Bound for Transient Routing Failure Duration • Transient routing failure min(du +du) MRAI du , du u u v 0 0
Occurrence of Transient failures in a typical BGP system • In a typical BGP system, transient failures are prevalent. • Tier-1 ASes can experience transient routing failures, where alternate routes come from their edge routers. • Non tier-1 ASes can experience transient routing failures, where alternate routes are obtained from other ASes.
Outline • What is transient routing failures? • When can transient routing failures occur? • How long can transient routing failures last? • Measurement results
Measuring Transient Failures within a tier-1 AS BGP updates, BGP tables and router configuration files are collected during July 2004 Cumulative distribution of transient Failure Duration Percentage of transient failures among all routing failures that last less than 30 seconds
Measuring Transient Failures contd. • Transient failures in tier-2 ASes using Oregon RouteView’s BGP updates (July 2004)
Popularity of Prefixes Experiencing Transient Failures • We aggregate the Netflow data collected in the tier-1 AS during the week (1/2/2005~1/8/2005) • Transient routing failures can impact on popular prefixes and unpopular prefixes Fraction of transient routing failures
Conclusions • Transient routing failures are prevalent in the Internet, and can last for a significant period of time. • Majority of transient failures occur under the commonly applied routing policy setting. • Popular and unpopular prefixes can experience transient failures.