1 / 86

Management of Routing Protocols in IP Networks

Management of Routing Protocols in IP Networks . Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003. Introduction. Internet connects millions of computers Internet is packet-switched: Each packet travels independently of the rest Routers provide connectivity

eileen
Télécharger la présentation

Management of Routing Protocols in IP Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Management of Routing Protocols in IP Networks Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003 Ph.D. Defense

  2. Introduction • Internet connects millions of computers • Internet is packet-switched: • Each packet travels independently of the rest • Routers provide connectivity • Routers forward packets so that they reach their ultimate destination • Forwarding is destination-based and hop-by-hop • Router decides next-hop (i.e., neighbor router) for each packet based on its destination address • Routing protocols allow routers to determine next-hop(s) for every destination Ph.D. Defense

  3. Management of Routing Infrastructure • Management of routing infrastructure is a nightmare • “Simple core (= routing infrastructure), smart edge (= end hosts)” design paradigm • Internet only provides a best-effort, connectionless, unreliable service • Routing is not designed with manageability in mind • Large distributed system • Hundreds of routers and thousands of links in big service provider networks • Variety of routing protocols • The infrastructure is evolving • New services require new protocols and devices Ph.D. Defense

  4. Dissertation Contribution • Focuses on management of Open Shortest Path First (OSPF) protocol • OSPF is widely used to control routing within service provider and enterprise networks • Three areas of focus • Monitoring • Characterization • Maintenance Ph.D. Defense

  5. Monitoring • Motivation: • Effective management requires sound monitoring systems • Contribution: • Design and implementation of an OSPF monitor • Deployment in two commercial networks • Has proved valuable for trouble-shooting and identifying impending problems in early stage • Collection and archiving of OSPF data that is used for performance improvement, post-mortem analysis and further research Ph.D. Defense

  6. Characterization • Motivation: • Need sound simulation and analytical models for scalability studies, addition of new features etc... • How do we parameterize these models? • Need vendor-independent benchmarking methods • Contribution: • Black-box techniques for estimating OSPF processing delays within a router • Has become basis for OSPF benchmarking standardization efforts • Case study of OSPF dynamics in an enterprise network Ph.D. Defense

  7. Maintenance • Motivation: • Maintenance of routers occurs fairly frequently • Protocol enhancements, bug fixes, hardware/software upgrades • During maintenance, operators have to withdraw router undergoing maintenance • Leads to route flapping and instability • How to perform seamless maintenance? • Contribution: • I’ll Be Back (IBB) capability for OSPF • Allows “router-under-maintenance” to be used for forwarding Ph.D. Defense

  8. Outline • Background • Routing and OSPF overview • Design of an IP router • Monitoring • OSPF Monitor • Characterization • Black-box measurements for OSPF • Case study of OSPF dynamics • Maintenance • I’ll Be Back (IBB) Capability for OSPF • Conclusions and future work Ph.D. Defense

  9. Routing in the Internet AS1 AS2 BGP OSPF IS-IS BGP BGP BGP BGP AS3 AS4 AS5 BGP BGP RIP OSPF OSPF • Internet is a collection of Autonomous Systems (ASes) • Two classes of routing protocols • IGP (Interior Gateway Protocols) • Used within an AS • Example: OSPF, IS-IS, RIP, EIGRP • EGP (Exterior Gateway Protocols) • Used across ASes • Example: BGP Ph.D. Defense

  10. Overview of OSPF • OSPF is a link-state protocol • Every router learns entire network topology • Topology is represented as graph • Routers are vertices, links are edges • Every link is assigned weight through configuration • Every router uses Dijkstra’s single source shortest path algorithm to build its forwarding table • Router builds Shortest Path Tree (SPT) with itself as root • Shortest Path Calculation (SPF) • Packets are forwarded along shortest paths defined by link weights Ph.D. Defense

  11. Border routers Area 1 Area 2 Area 0 Areas in OSPF • OSPF allows domain to be divided into areas for scalability • Areas are numbered 0, 1, 2 … • Hub-and-spoke with area 0 as hub • Every link is assigned to exactly one area • Routers with links in multiple areas are called border routers Ph.D. Defense

  12. OSPF domain R1’s View R1 R1 Area 0 Area 0 200 100 200 100 R2 R3 R2 R3 400 500 400 500 300 200 300 200 B1 B2 B1 B2 20 10 C1 C2 60 70 20 10 50 10.10.4.0/24 10.10.5.0/24 10.10.5.0/24 10.10.4.0/24 Area 1 Area 1 Summarization with Areas • Each router learns • Entire topology of its attached areas • Information about subnets in remote areas and their distance from the border routers • Distance = sum of link costs from border router to subnet Ph.D. Defense

  13. Link State Advertisements (LSAs) • Every router describes its local connectivity in Link State Advertisements (LSAs) • Router originates an LSA due to… • Change in network topology • Example: link goes down or comes up • Periodic soft-state refresh • Recommended value of interval is 30 minutes • LSA is flooded to other routers in the domain • Flooding is reliable and hop-by-hop • Includes change and refresh LSAs • Flooding leads to duplicate copies of LSAs being received • Every router stores LSAs (self-originated + received) in link-state database (= topology graph) Ph.D. Defense

  14. Adjacency • Neighbor routers (i.e., routers connected by a physical link) form an adjacency • The purpose is to make sure • Link is operational and routers can communicate with each other • Neighbor routers have consistent view of network topology • To avoid loops and black holes • Link gets used for data forwarding only after adjacency is established • Use of periodic Hellos to monitor the status of link and adjacency Ph.D. Defense

  15. Data packet Forwarding Forwarding Data packet Interface card Interface card Design of an IP Router Route Processor (CPU) OSPF Process Routing calculation BGP Process Routing calculation RIP Process Routing calculation Route Manager Control Plane Data Plane Forwarding Info. Base (FIB) Switching Fabric Ph.D. Defense

  16. Outline • Background • Monitoring • Motivation: • Effective management requires sound monitoring systems • Contribution: OSPF monitor • Design • Three component and their functionality • Deployment in two commercial networks • How OSPF Monitor is being used • Lessons learnt through deployment • Characterization • Maintenance • Conclusions and future work Ph.D. Defense

  17. OSPF Monitor: Objectives • Real-time analysis of OSPF behavior • Trouble-shooting, alerting • Real-time snapshots of OSPF network topology • Off-line analysis • Post-mortem analysis of recurring problems • Identify anomaly signatures and use them to predict impending problems • Allow operators to tune configurable parameters • Improve maintenance procedures • Analyze OSPF behavior in commercial networks Ph.D. Defense

  18. Related Work • Route monitoring • Commercial IP monitors • Route Dynamics (IPSUM), Route Explorer (PacketDesign) • IPMON project at Sprint • IS-IS and BGP listeners • RouteViews and RIPE • Collects BGP updates from several networks • Topology tracking • OSPF topology server [shaikh:jsac02] • Evaluation and comparison of LSA-based versus SNMP-based approaches • Rocketfuel project at UW Seattle • Inference of intra-domain topologies from end-to-end measurements Ph.D. Defense

  19. Components • Data collection: LSA Reflector (LSAR) • Passively collects OSPF LSAs from network • “Reflects” streams of LSAs to LSAG • Archives LSAs for analysis by OSPFScan • Real-time analysis: LSA aGgregator (LSAG) • Monitors network for topology changes, LSA storms, node flaps and anomalies • Off-line analysis: OSPFScan • Tools for analysis of LSA archives • Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics Ph.D. Defense

  20. LSAG Real-time Monitoring OSPFScan Off-line Analysis LSA archive LSA archive LSA archive Example LSAs LSAs LSAs LSAR 1 LSAR 2 “Reflect” LSA “Reflect” LSA replicate LSAs LSAs LSAs OSPF Network Area 0 Area 2 Area 1 Ph.D. Defense

  21. How LSAR attaches to Network • Host mode • Join multicast group • Adv: completely passive • Disadv: not reliable, delayed initialization of LSDB • Full adjacency mode • Form full adjacency with a router • Adv: reliable, immediate initialization of LSDB • Disadv: LSAR’s instability can impact entire network • Partial adjacency mode • Keep adjacency in a state that allows LSAR to receive LSAs, but does not allow data forwarding over link • Adv: reliable, LSAR’s instability does not impact entire network, immediate initialization of LSDB • Disadv: can raise alarms on the router Ph.D. Defense

  22. LSA aGregator (LSAG) • Analyzes “reflected” LSAs from LSARs over TCP connections in real-time • Generates console messages: • Changes in OSPF network topology • ADJACENY COST CHANGE: rtr 10.0.0.1 (intf 10.0.0.2)  rtr 10.0.0.5 old_cost 1000 new_cost 50000 area 0.0.0.0 • Node flaps • RTR FLAP: rtr 10.0.0.12 no_flaps 7 flap_window 570 sec • LSA storms • LSA STORM: lstype 3 lsid 10.1.0.0 advrt 10.0.0.3 area 0.0.0.0 no_lsas 7 storm_window 470 sec • Anomalous behavior • TYPE-3 ROUTE FROM NON-BORDER RTR: ntw 10.3.0.0/24 rtr 10.0.0.6 area 0.0.0.0 Ph.D. Defense

  23. OSPFScan • Tools for off-line analysis of LSA archives • Parse, select (based on queries), and analyze • Derivation and analysis of auxiliary information from LSA archives • LSAs indicating network topology changes • Routing table entries • How OSPF routing tables evolved in response to network changes • How end-to-end path within OSPF domain looked like at any instance • Topology changes as graph-based abstraction • Vertex addition/deletion and link addition/deletion/change_weight • Playback of topology change events • Essentially an LSAG playback Ph.D. Defense

  24. Deployment • Deployed in two commercial networks • Enterprise network • 15 areas, 500+ routers; Ethernet-based LANs • Deployed since February, 2002 • LSA archive size: 10 MB/day • LSAR connection: host mode • ISP network • Area 0, 100+ routers; Point-to-point links • Deployed since January, 2003 • LSA archive size: 8 MB/day • LSAR connection: partial adjacency mode Ph.D. Defense

  25. LSAG in Day-to-day Operations • Generation of alarms by feeding messages into higher layer network management systems • Correlation and grouping of messages into a single alarm • Prioritization of messages • Validation of maintenance steps and monitoring the impact of these steps on network-wide OSPF behavior • Example: • Operators change link weights to carry out maintenance activities • A “link-audit” web-page allows operators to keep track of link weights in real-time Ph.D. Defense

  26. Problems Caught by LSAG • Equipment problem • Detected internal problems in a crucial router in enterprise network • Problem manifested as episodes of OSPF adjacency flapping • Configuration problem • Identified assignment of same router-ids to two routers in enterprise network • OSPF implementation bug • Caught a bug in refresh algorithm of routers from a particular vendor in ISP network • Bug resulted in a much faster refresh of LSAs than standards-mandated rate Ph.D. Defense

  27. Long Term Analysis by OSPFScan • LSA traffic analysis • Identified excessive duplicate LSA traffic in some areas of the enterprise network • Led to root-cause analysis and preventative steps • Generation of statistics • Inter-arrival time of change LSAs in the ISP network • Fine-tuning configurable timers related to SPF calculation • Mean down-time and up-time for links and routers in the ISP network • Assessment of reliability and availability as ISP network gears for deployment of new services Ph.D. Defense

  28. Lessons Learnt through Deployment • New tools reveal new failure modes • Real networks exhibit significant activity • Maintenance and genuine problems • Archive all LSAs • LSA volume is manageable • Stability and reliability of monitor is extremely important • Keep data collection separate from its analysis • Keep data collector as simple as possible • Add functionality incrementally and through interaction with users Ph.D. Defense

  29. Summary • Three component architecture • LSAR: LSA capture from the network • LSAG: real-time analysis of LSA stream • Detection and trouble-shooting of problems • OSPFScan: off-line analysis tools for LSA archives • Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics • Deployed in two commercial networks • Has proven a valuable network management tool • “OSPF Monitor was a lifesaver” • VP of Networking, Enterprise network  • When monitor caught an impending failure in an early stage Ph.D. Defense

  30. Outline • Background • Monitoring • Characterization • Motivation: • Simulation and analytical models, benchmarking • Contributions: • Black-box techniques for estimating OSPF processing delays on a router • Tasks we measure, methodology, results for Cisco and GateD • Case study of OSPF dynamics in an enterprise network • Maintenance • Conclusions and future work Ph.D. Defense

  31. Black-box Measurements for OSPF • OSPF processing delays within a router matter! • Add up to impact convergence and stability • Guidance in tuning configurable parameters, head to head vendor comparisons, simulation models • Instrumenting routing code for measuring delays is challenging • Commercial implementations are proprietary • May involve grappling with • Numerous code versions, hardware platforms, and developers • Use black-box measurements • Measure the timing delays using external observations • Applied to Cisco and GateD OSPF implementations Ph.D. Defense

  32. Related Work • White-box measurements for IS-IS [alaettinoglu] • SPF delays reported are comparable to results obtained by us • Empirical analysis of router behavior under large BGP routing tables [chang:imw02] • Cisco and Juniper routers • Benchmarking Methodology working group (bmwg) at IETF • Drafts related to OSPF benchmarking • Our black-box methods are basis for some benchmark tests Ph.D. Defense

  33. SPF Calculation LSA LSA LS Ack Data packet What tasks did we measure? LSA Processing Route Processor (CPU) OSPF Process LSA Flooding Topology View SPF Calculation FIB Update FIB Forwarding Forwarding Switching Fabric Interface card Interface card Ph.D. Defense

  34. Emulated topology LSA LSA LSA Methodology Target router TopTracker Testbed • Load emulated topology on target router • Initiate task of interest • Measure the time for task Ph.D. Defense

  35. B time A X C Measuring Task Time • Use a black-box method to bracket task start and finish times • Subtract out intervals that precede and exceed these times top bracket event task start time task finish time bottom bracket event X = A - (B + C) Ph.D. Defense

  36. Load desired topology TopTracker Target Router Send initiatorLSA B C Send duplicate LSA A X E D Send ack for duplicate LSA Measuring SPF Calculation Initiator LSA arrives SPF calculation starts time SPF calculation ends Ack for duplicate LSA arrives • X = A – (B + C + D + E) • Estimate the overhead = B + C + D + E Ph.D. Defense

  37. TopTracker Target Router B Send initiator LSA Send duplicate LSA C overhead D E Duplicate LSA processing done; send ack Estimating the Overhead • Remove SPF calculation from bracket • spf_delay = 60 seconds Initiator LSA arrives Duplicate LSA arrives time Initiator LSA processing done Ack for duplicate LSA arrives SPF calculation starts overhead = B + C + D + E Ph.D. Defense

  38. Results • Results for Cisco GSR, 7513 and GateD • For GateD, comparison of black-box results with those obtained using instrumentation (white-box) • Route processors • Cisco: 200 MHz R5000 processor • GateD: 500 MHz AMD-K6 processor • Topology: full nn mesh with random OSPF edge weights • n in range 10, 20, …, 100 Ph.D. Defense

  39. Results for Cisco Routers • Observations • Similar results for two models • SPF calculation time is O(n2) Ph.D. Defense

  40. Results for GateD • Observations: • Black-box over-estimates white-box measurement • Black-box captures the characteristics very well Ph.D. Defense

  41. Summary • Black-box methods for estimating OSPF processing delays • Work across wide range of time delays • Work for pure CPU bound tasks • Effective in capturing scaling • Match with white-box measurements • Applied methods to Cisco GSR and 7513 • LSA Processing: 100-800 microseconds • LSA flooding: 30-40 milliseconds • Pacing timer is the determining factor • SPF calculation: 1-40 milliseconds • O(n2) behavior for full n x n mesh • FIB update time: 100-300 milliseconds • No dependence on topology size Ph.D. Defense

  42. Outline • Background • Monitoring • Characterization • Motivation: • Simulation and analytical models, benchmarking • Contributions: • Black-box techniques for estimating OSPF processing delays on a router • Case study of OSPF dynamics in an enterprise network • Enterprise network topology, categorization of LSA traffic, results • Maintenance • Conclusions and future work Ph.D. Defense

  43. Case Study of OSPF Dynamics • OSPF behavior in commercial networks is not well understood • Understanding dynamics of LSA traffic is key to better understanding of OSPF • Bulk of OSPF processing is due to LSAs • Big impact on OSPF convergence, (in)stability • Analysis of LSA archives collected by OSPF monitor in enterprise network • Focus on April, 2002 data Ph.D. Defense

  44. Related Work • Several studies focusing on BGP dynamics in the Internet • Relatively easy to collect BGP data • BGP is more complicated • OSPF dynamics in a regional service provider network (MichNet) [watson:icdcs03] • One year worth of data • Several findings are similar to our observations • Analysis of OSPF stability through simulations [basu:sigcomm01] Ph.D. Defense

  45. Enterprise Network • Provides customers with connectivity to applications and databases residing in data center • OSPF network • 15 areas, 500 routers • This case study covers 8 areas, 250 routers • One month: April, 2002 • Ethernet-based LANs • Customers are connected via leased lines • Customer routes are injected via EIGRP into OSPF • The routes are propagated via external LSAs Ph.D. Defense

  46. External (EIGRP) Area A LAN1 LAN 2 B1 B2 Monitor Border rtrs Area 0 Enterprise Network Topology Customer Customer Customer EIGRP EIGRP EIGRP OSPF Domain Area A Area B Area 0 Area C Servers Database Applications Monitor uses host mode to receive LSAs Ph.D. Defense

  47. Categorizing LSA Traffic • Refresh LSA traffic • Originated due to periodic soft-state refresh • Forms base-line LSA traffic • Can be predicted using configuration information • Change LSA traffic • Originated due to changes in network topology • E.g, link goes down/comes up • Allows detection of anomalies and problems • Duplicate LSA traffic • Received due to redundancy in flooding • Overhead -- wastes resources Ph.D. Defense

  48. Area 0 Area 2 Genuine Anomaly Genuine Anomaly Days Days Artifact: 23 hr day (Apr 7) Days Days Area 3 Area 4 LSA Traffic in Different Areas Refresh LSAs Change LSAs Duplicate LSAs Ph.D. Defense

  49. Baseline LSA Traffic: Refresh LSAs • Refresh LSA traffic can be reliably predicted using router configuration files • Important for workload generation Days Days Area 2 Area 3 Ph.D. Defense

  50. Refresh process is not synchronized • No evidence of synchronization • Contrary to simulation-based study [basu:sigcomm01] • Reasons • Changes in the topology help break synchronization • LSA refresh at one router is not coupled with LSA refresh at other routers • Drift in the refresh interval of different routers Ph.D. Defense

More Related