Surviving Large Scale Internet Failures

Surviving Large Scale Internet Failures Dr. Krishna Kant Intel Corp. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

The Problem • Internet has two critical elements • Routing (Inter & intra domain) • Name resolution • How robust are they against large scale failures/attacks? • How do we improve them? K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Outline • Overview • Internet Infrastructure elements & Large Scale Failures • Dealing with Routing Failures • Routing algorithms & their properties • Improving BGP Convergence • Other Performance Metrics • Dealing with Name Resolution Failures • Name resolution preliminaries • DNS vulnerabilities & Solution • Conclusions and Open Issues K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

inter-domain router intra-domain router Internet Routing • Not a homogeneous network • A network autonomous systems (AS) • Each AS under the control of an ISP. • Large variation in AS sizes – typical heavy tail. • Inter-AS routing • Border Gateway Protocol (BGP). A path-vector algorithm. • Serious scalability/recovery issues. • Intra-AS routing • Several algorithms; usually work fine • Central control, smaller network, … K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Internet Name Resolution • Domain Name Server (DNS) • May add significant delays • Replication of TLDs & others resists attacks, but extensive caching makes it easier! • Not designed for security - can be easily attacked. • DNS security • Crypto techniques can stop many attacks, but substantial overhead & other challenges. • Other solutions • Peer to peer based, but no solution is entirely adequate. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Large Scale Failures • Characteristics: • Affects a significant % of infrastructure in some region • Routers, Links, Name servers • Generally non-uniformly distributed, e.g., confined to a geographical area. • Why study large scale failures? • Several instances of moderate sized failures already. • Larger scale failures only a matter of time • Potentially different behavior • Secondary failures due to large recovery traffic, substantial imbalance in load, … K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Routing Failure Causes • Large area router/link damage (e.g., earthquake) • Large scale failure due to buggy SW update. • High BW cable cuts • Router configuration errors • Aggregation of large un-owned IP blocks • Happens when prefixes are aggregated for efficiency • Incorrect policy settings resulting in large scale delivery failures • Network wide congestion (DoS attack) • Malicious route advertisements via worms K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Name Resolution Failure Causes • Records containing fake (name, IP) info can be easily altered. • “Poisoning” of records doesn’t even require compromising the server! • Extensive caching  More points of entry. • Poisoning of TLD records (or other large chunks of name space) • Disable access to huge number of sites • Example: March 2005 .com attack • Poisoning a perfect way to gain control of sensitive information on a large scale. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Major Infrastructure Failure Events • Blackout – widespread power outage (NY & Italy 2003) • Hurricane – widespread damage (Katrina) • Earthquake – Undersea cable damage (Taiwan Dec 2006) • Infrastructure induced (e.g., May 2007, Japan) • Many other potential causes K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Taiwan Earthquake Dec 2006 • Issues: • Global traffic passes through a small number of seismically active choke points. • Luzon strait, Malacca strait, South coast of Japan • Satellite & overland cables don’t have enough capacity to provide backup. • Several countries depend on only 1-2 distinct landing points. • Outlook • Economics makes change unlikely. • May be exploited by collusion of pirates + terrorists • Will perhaps see repeat performance! • Reference: http://www.pinr.com/report.php?ac=view_report&report_id=602&language_id=1 K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Hurricane Katrina (Aug 2005) • Major local outages. No major regional cable routes through the worst affected areas. • Outages persisted for weeks & months. Notable after-effects in FL (significant outages 4 days later!) • Reference: http://www.renesys.com/tech/presentations/pdf/Renesys-Katrina-Report-9sep2005.pdf K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

NY Power Outage (Aug 2003) • No of concurrent network outages vs. time • Large ASes suffered less than smaller ones. • Behavior very similar to Italian power outage of Sept 2003. • A significant no of ASes had all their routers down for >4 hours. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Slammer Worm (Jan 2003) • Scanner worm started w/ buffer overflow of MS SQL. • Very rapid replication, huge congestion buildup in 10 mins • Korea falls out, 5/13 DNS root servers fail, failed ATMs, … • High BGP activity to find working routes. • Reference: http://www.cs.ucsd.edu/ savage/papers/IEEESP03.pdf K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Infrastructure Induced Failures • En-masse use of backup routes by 4000 Cisco routers in May 2007 (Japan) • Routing table rewrites caused 7 hr downtime in NE Japan. • Reference: http://www.networkworld.com/news/2007/051607-cisco-routers-major-outage-japan.html • Akamai CDN failure – June 2004 • Probably widespread failures in Akamai’s DNS. • Reference: http://www.landfield.com/isn/mail-archive/2004/Jun/0064.html • Worldcom router mis-configuration – Oct 2002 • Misconfigured eBGP router flooded internal routers with routes. • Reference: http://www.isoc-chicago.org/internetoutage.pdf K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Routing Algorithms • Basic methods • Distance vector based (DV) • Link State Based (LS) • Path Vector Based (PV) • DV Examples • RIP (Routing Information Protocol). • IGRP (Interior gateway routing Protocol). • LS Examples • OSPF (Open shortest path first) • IS-IS (Intermediate system to IS) • PV Examples • BGP (Border Gateway Protocol) • There are inter-domain (iBGP) & inter-domain (eBGP) versions. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

A-D=3 E-D=2 F-D=1 E-D=2 C-D=3 Distance Vector (DS) Protocols • Each node advertises its path costs to its neighbors. • Very simple but “count to infinity” problem • Node w/ a broken link will receive old cost & use it to replace broken path! • Several versions to fix this. • Difficult to use policies Routing Table for A B D E A F C K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Link State (LS) Protocols • Each node keeps complete adjacency/cost matrix & computes shortest paths locally • Difficult in a large network • Any failure propagated via flooding • Expensive in a large network • Loop-free & can use policies easily. 3 B D 1 4 6 A 2 5 E C K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

B/E/F/D:3 E/F/D:2 F/D:1 B D E/F/D:2 E A Link_cost=2 F C C/E/F/D:4 Path Vector Protocols • Each node initialized w/ some paths for each dest. • Active paths updated much like in DV • Explicitly withdraw failed paths (& advertise next best) • Filtering on incoming/outgoing paths, path selection policies • Paths A to D: • Via B: B/E/F/D, cost 3 • Via C: C/E/F/D, cost 4 K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Intra-domain Routing under Failures • Inter-domain routing usually can limp back to normal rather quickly • Single domain of control • High visibility, common management network, etc. • Most ASes are small • Very simple policies only • Routing algorithms • Distance-vector (RIP, IGRP) – simple, enhancements prevent most count-to-infinity problems • Link state (OSPF): Flooding handles failures quickly. • Path vector (iBGP): Behavior similar to eBGP K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

I-BGP IGP R3 R2 R4 R1 R5 R A E-BGP announce B AS2 AS1 AS3 border router internal router B Inter-domain Routing • BGP: Default inter-AS protocol (RFC 1771) • Path vector protocol, runs on TCP • Scalable, “rich” policy settings • But prone to long “convergence delays” • High packet loss & delay during convergence K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

BGP Routing Table • Prefix: origin address for dest & mask (eg.,207.8.128.0/17) • Next hop: Neighbor that announced the route • One active route, others kept as backup • Route “attributes” -- some may be conveyed outside • ASpath: Used for loop avoidance. • MED (multi-exit discriminator); preferred incoming path • Local pref: Used for local path selection K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Withdrawn route lengths (2 octets) Withdrawn routes (variable length) Length of path all attributes (2 octets) Advertised path attributes (variable length) Reachability Information (variable length) BGP Messages • Message Types • Open (establish TCP conn), notification, update, keepalive • Update • Withdraw old routes and/or advertise new ones. • Only active routes can be advertised. • May need to also advertise sub-prefix (e.g., 207.8.240.0/24 which is contained in 207.8.128.0/17) K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

BGP decision process accept, deny, set preferences forward, not forward set MEDs Route pkts Routes received from peers Routes sent to peers BGP routing table IP routing Table Output policy engine Input policy engine Routing Process • Input & output policy engines • Filter routes (by attributes, prefix, etc.) • Manipulate attributes (eg. Local pref, MED, etc.) K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

BGP Recovery • BGP Convergence Delay • Time for ALL routes to stabilize. • 4 different times defined! • BGP parameters • Minimum Route Advertisement Interval (MRAI) • Path cost, path priority, input filter, output filter, … • MRAI specifics • Applies only to adv., not withdrawals • Intended – per destination, Implemented – per peer • Damps out cycles of withdrawals & advertisements. • Convergence delay vs. MRAI: A V-shaped curve Convergence Delay MRAI K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Impact of BGP Recovery • Long Recovery Times • Measurements for isolated failures • >3 minutes for 30% of isolated failures • > 15 minutes for 10% of cases • Even larger for large scale failures. • Consequences • Connection attempts over invalid routes will fail. • Long delays & compromised QoS due to heavy packet loss. • Packet loss: 30X increase, delay: 4X Graphs taken from ref #2, Labovitz, et.al. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H E F G I D 2 3 A B 10 C H E F G I D 2 3 A B 10 C BGP Illustration (1) • Example Network • All link costs =1 except as shown. • Notation for best path PSD=(N, cost) [X] • S,D: Source & destination nodes • N: Neighbor of S thru which the path goes • X: Actual path (for illustration only) • Sample starting paths to C • PBC=(D,3) [BDAC], PDC=(A,2) [DAC], PFC=(E,3) [FEAC], PIC=(H,5) [IHGAC] • Paths shown using arrows (all share seg AC) • Failure of A • BGP does not attempt to diagnose problem or broadcast failure events. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G I G I D D 2 2 2 3 3 A B A B 10 10 C C BGP Illustration (2) • NOTE: Affected node names in blue, rest in white • A’s neighbors install new paths avoiding A  • PDC=(B,5) [DBFEAC], PEC=(EF,5) [EFBDAC], PGC=(H,6) [GHIBDAC] • D advertises PDC =[DBFEAC] to B • Current PBC is via D  B must pick a path not via D  • B installs PBC=(F,4) [BFEAC] & advertises it to F & I (First time) • Note: Green indicates first adv by B K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G G I I D D 2 2 3 3 A B A B 10 10 C C BGP Illustration (3) • E advertises PEC = [EFBDAC] to F • Current PFC is via E  • F installs PFC=(B,4) [FBDAC] & advertises to E & B • G advertises PGC =[GHIBDAC] to H • Current PHC is via H  • H installs PHC=(I,5) [HIBDAC] & advertises to I K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G I G I D D 2 2 3 3 A B A B 10 10 C C BGP Illustration (4) • B’s adv [BFEAC] reaches F & I • PFC=(B,4) [FBDAC] thru B  F withdraws PFC & has no path to C! • PIC=(H,5) [IHGAC] is shorter  I retains it. • F’s adv [FBDAC] reaches B: PBC=(F,4) [BFEAC] thru F  • B installs PBC=(I,6) [BIHGAC] and advertises to D, F & I • Note:Green text: B’s first adv; Greytext: B’s subsequent adv. (disallowed by MRAI) K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G I G I D D 2 2 3 3 A B A B 10 10 C C BGP Illustration (5a) • H’s adv [HIBDAC] reaches I • PIC=(H,5) [IHGAC] thru H  I installs PIC=(B,6) [IBDAC] & advertises to B & H. • B’s adv [BIHGAC] reaches D, F • D updates PDC=(B,8) [DBIHGAC] (Just a local update) • F updates PFC=(B,8) [FBIHGAC] & advertises to E • w/ MRAI • D & F have wrong (lower) cost metric, but will still follow the same path thru. B. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G I G I D D 2 2 3 3 A B A B 10 10 C C BGP Illustration (5b) • B’s adv [BIHGAC] reaches I • PIC=(B,6) [IBDAC] thru B  I withdraws PIC & has no path to C! • w/ MRAI • I will continue to use the nonworking path IBDAC. Same as having no path. • I’s adv [IBDAC] reaches B & H: • H changes its path to [HIBDAC] • B’s path thru I, so B installs (C,10) & advertises to its neighbors D, F & I K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G I G I D D 2 2 3 3 A B A B 10 10 C C BGP Illustration (5c) • F’s update reaches E • E updates its path locally. • I’s withdrawal of [IBDAC] reaches H (& also B) • H withdraws the path IBDAC & has no path to C! • H’s withdrawal of [HIBDAC] reaches G (& also I) • G withdraws the path GHIBDAC & has no path to C! • w/ MRAI • Nonworking paths stay at E, H & G K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G G I I D D 3 3 A B A B 10 10 C C BGP Illustration (6) – No MRAI • B’s adv [C] reaches D, F & I (in some order) • D updates its path cost (B,11) • F updates its path & cost (B,11) & advertises PFC to E. • I updates its path cost (B,13) & advertises PIC to H • Final updates • F’s update [FBC] reaches E which updates its path locally • I’s adv [IBC] reaches H • H updates its path & cost (I,14) [HIBC] & advertises PHC to G • G does a local update K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G I G I D D 2 2 3 3 A B A B 10 10 C C BGP Illustration (5’) – w/ MRAI • H’s adv [HIBDAC] reaches I • PIC=(H,5) [IHGAC] thru H  I installs PIC=(B,6) [IBDAC] & advertises to B & H. • I’s adv [IBDAC] reaches B & H: • H changes its path to [HIBDAC] • B’s path is thru I, so B installs (C,10) • When MRAI expires, B advertises to its neighbors D, F & I • Note: If MRAI is large, path recovery gets delayed K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H H E F E F G I G I D D 2 3 3 A B A B 10 10 C C BGP Illustration (6’) – w/ MRAI • B’s adv [C] reaches D, F & I (in some order) • D updates its path cost (B,11) • F updates its path & cost (B,11) & advertises PFC to E. • I installs updated path [IBC] and advertises it to H • Final updates: Same as for (6) • W/ vs. w/o MRAI: • MRAI avoids some unnecessary path updates (less router load) K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

BGP: Known Analytic Results • Lots of work for isolated failures • Labovitz [1]: • Convergence delay bound for full mesh networks: O(n3) for average case, O(n!) for worst case. • Labovitz [2], Obradovic [3], Pei[8]: • Assuming unit cost per hop • Convergence delay  Length of longest path involved • Griffin and Premore [4]: • V shaped curve of convergence delay wrt MRAI. • #Messages wrt MRAI decreases at a decreasing rate. • LS failures: Even harder! K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Evaluation of LS Failures • Evaluation methods • Primarily simulation. Analysis is intractable • BGP Simulation Tools • Several available, but simulation expense is the key! • SSFNET – scalable, but max 240 nodes on 32-bit machine • SSFNet default parameter settings • MRAI but jittered by 25 % to avoid synchronization • OSPFv2 used as the intra-domain protocol K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Topology Modeling • Topology Generation: BRITE • Enhanced to generate arbitrary degree distributions • Heavy tailed based on actual measurements. • Approx: 70% low & 30% high degree nodes. • Mostly used 1 router/AS  Easier to see trends. • Failure topology: Geographical placement • Emulated by placing all AS routers and ASes on a 1000x1000 grid • The “area” of an AS  No. of routers in AS K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Convergence Delay vs. Failure Extent • Initial rapid increase & then flattens out. • Delays & increase rate both go up with network size  Large failures can a problem! K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Delay & Msg Traffic vs. MRAI • Small networks in simulation  • Optimal MRAI for isolated failures small (0.375 s). • Chose a few larger values • Main observations • Larger failure  Larger MRAI more effective K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Convergence Delay vs. MRAI • A V-shaped curve, as expected • Curve flattens out as failure extent increases • Optimal MRAI shifts to right with failure extent. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Impact of AS “Distance” • ASes more likely to be connected to other “nearby” ASes. • b indicates the preference for shorter distances (smaller b higher preference) • Lower convergence delay for lower b. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Reducing Convergence Delays • Many schemes in the literature • Most evaluated only for isolated failures. • Some popular schemes • Ghost Flushing • Consistency Assertions • Root Cause Notification • Our work (Large scale failure focused) • Dynamic MRAI • Batching • Speculative Invalidation K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

H E F G I D 2 3 A B 10 C Ghost Flushing • Bremler-Barr, Afek, Schwarz: Infocom 2003 • An adv. implicitly replaces old path • GF withdraws old path immediately. • Pros • Withdrawals will cascade thru the network • More likely to install new working routes • Cons • Substantial additional load on routers • Flushing takes away a working route! • Install BC  • Routes at D, F, I via B will start working • Flushing will take them away. K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Consistency Assertion S • Pei, Zhao, et.al., Infocom 2002 • If S has two paths S:N1xD & S:N2yN1xD, & first path is withdrawn, then second path is not used (considered infeasible). • Pros • Avoids trying out paths that are unlikely to be working. • Cons • Consistency Checking can be expensive N2 N1 y x D K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Root Cause Notification • Pei, Azuma, Massy, Zhang: Computer Networks, 2004 • Modify BGP messages to carry root cause (e.g., node/link failure). • Pros • Avoid paths w/ failed nodes/links  substantial reduction in conv. delay. • Cons • Change to BGP protocol. Unlikely to be adopted. • Applicability to large scale failures unclear. H E F G I D 2 3 A B 10 C • D, E, G diagnose if A or link to A has failed. • Propagate this info to neighbors K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Large Scale FailuresOur Approach • What we can’t or wouldn’t do? • No coordination between ASes • Business issues, security issues, very hard to do, … • No change to wire protocol (i.e., no new msg type). • No substantial router overhead • Critical for large scale failures • Solution applicable to both isolated & LS failures. • What we can do? • Change MRAI based on network and/or load parms (e.g., degree dependent, backlog dependent, …) • Process messages (& generate updates) differently K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Key Idea: Dynamic MRAI • Increase MRAI when the router is heavily loaded • Reduces load & #of route changes. • Relationship to large scale failure • Larger failure size  Greater router loading  Larger MRAI more appropriate. • Router load directed MRAI caters to all failure sizes! • Implementation: • Queue length threshold based MRAI adjustment. Decrease th1 Decrease th2 Increase th1 Increase th2 K. Kant, Surviving Large Scale Internet Failures, DSN 2007 Tutorial

Surviving Large Scale Internet Failures

Surviving Large Scale Internet Failures

Presentation Transcript

Surviving Large Scale Internet Outages

Large Scale Internet Search at Ask.com

Large Scale Structure

Surviving the Internet

Surviving Failures in Bandwidth Constrained Datacenters

Surviving Failures in Bandwidth-Constrained Datacenters

large scale Refactoring

Large-scale matching

A Large-Scale Study of Failures in High-Performance Computing Systems

LARGE SCALE

Surviving a Large Scale Organized Hunger Strike at your institution

Large- scale Organisations

Surviving Large Scale Rollout of Financial System Overhaul

Large Scale Internet Search at Ask

Root Cause Analysis of Failures in Large-Scale Computing Environments

Large scale

Large-Scale Systems

Large Scale Sharing

Large Scale Operations

Large Scale Applications

Large Scale Drupal