280 likes | 430 Vues
This document explores the challenges in interdomain routing, particularly related to BGP's responsiveness and consistency. It introduces the concept of consensus routing, which separates safety and liveness, thereby improving the overall network availability. Key contributions include the development of stable and transient modes that help mitigate issues such as packet loss due to BGP updates and transient loops. The article provides an overview of how distributed algorithms can maintain consistent routing tables while ensuring prompt reactions to network changes, ultimately leading to enhanced performance and reduced overhead.
E N D
Consensus Routing Antonio-Gabriel Sturzu, SCPD
Table of Contents • Introduction • Consistency issues • Consensus Routing Overview • Stable Mode • Transient Mode • Performance and Overhead
Introduction • Internet routing, especially interdomain routing has favored responsiveness over consistency • In interdomain routing a router applies a received update immediately to its forwarding table before propagating it to other routers • BGP updates are known to cause up to 30% packet loss for two minutes or more after a routing change • Transient loops account for 90% of all packet loss
Introduction(2) • The primary contribution of the article is that is that it separates the safety concept from the liveness concept and associates consistency with safety and responsiveness with liveness • Consistency safety means that a router forwards a packet along a packet adopted by the upstream routers • Liveness means that the system reacts quickly to failures or policy changes • Separating safety and liveness improves end-to-end availability • They are obtained through stable and transient modes
Consistency Issues • BGP link failures
Consistency issues(2) • BGP policy change
Consistency issues(3) • iBGP link recovery • Such blackholes can cause packet loss for tens of seconds
Consistency issues(4) • BGP policy cycles
Consensus Routing Overview • Forwards packets using • Stable mode • Transient mode • Consensus routers simply log the new routes computed by the policy engine • Periodically all routers engage in a distributed coordination algorithm that determines the most recent set of complete updates
Consensus Routing Overview(2) • The coordination is based on classical distributed snapshot and consensus algorithms • The routers use the output of the coordination to compute a set of stable forwarding tables (STFs) that are guaranteed to be consistent
Stable Mode • The distributed coordination algorithm proceeds in epochs • Steps of an epoch k: • Update log • Distributed snapshot • The snapshot is a globally consistent view of all the updates in the system (complete or incomplete) • Frontier computation • Aggregation • Consensus • Flood
Stable Mode(2) • SFT computation • View change • Versioning • Garbage colection
Router State • Routing Information Base (RIB) • Stores for each destination • Route update received from each neighbor • Locally selected best route • Route advertised to each neighbor • History • Stores for each destination a chronological list of received and selected routes in the RIB • SFTs • Store for each destination the next-hop interfaces corresponding to the stable routes
Router State(2) • Triggers • Globally unique identifier for a set of causally related events propagating through the network • (AS number, trigger number) • In consensus routing each update carries a trigger that is associated with the route being implicitly withdrawn and replaced by the route announced in the update • It tracks when the implicit withdrawal is complete
Router State(3) • In order to maintain the safety property an AS A generates a new trigger to be sent along with an update upon • A failure of the next-hop in A’s current route to the destination • A policy change that causes A to prefer another route to the destination over the current one • Receiving a route from a neighbor B that it prefers over its current route via a different neighbor C
Frontier Computation • Aggregation • Send the set of triggers (complete or incomplete) • Consensus • Consolidators ensure that • There is no single point of failure • No single AS is trusted with the task of consolidating the snapshot • A consolidator is reachable from every AS with high probability • When consensus ends the consolidators use the snapshot report in order to compute the set of incomplete triggers I in the network
Frontier Computation(2) • In order to compute the set I they use the following idea: • A trigger is said to depend on all trigers that precede it in the history table • A trigger t is said to be complete if neither t nor any of his predecessors are incomplete • Flood • The set of incomplete triggers I and the set S of AS-es that succesfully participated in the distributed snapshot are sent to all AS-es
Transient Mode • Routing deflections • Backtracking • Detour routing • Backup routes • Use RBGP • Choosing the most link-disjoint backup route from the primary route protects against single link failures
Performance • Link failures • For BGP 13% of failures cause at least half of all AS-es to experience routing loops • For Consensus Routing with transient forwarding • Backtraking enables continuous connectivity for at least 74% of all AS-es following 99% of failure cases • By detouring connectivity is 98.5% • With backup routes connectivity is 98%
Performance • Policy change • For BGP in more than 55% of the test cases AS-es were disconnected from the destination due to transient loops formed during convergence • Consensus routing transitions from one set of consistent loop-free routes to another completely avoiding transient loops
Overhead • Volume of control traffic
Overhead(2) • Cost of consensus • For 9 nodes all the nodes learnt the agreed value in under 450 miliseconds • For 18 and 27 nodes times were 1.4 and 1.8 seconds • Path dilation • Measures how far packets have to be redirected
Overhead(3) • Path dilation
Overhead(4) • Response time • A 30 second epoch results in more than 90% of the paths being adopted in less than 2 minutes
Overhead(5) • Implementation Overhead • Consensus Routing adds 8% in update processing and about 11% additional lines of code to the BGP implementation