Evolving Toward a Self-Managing Network

Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex

Why is Network Management So Darn Hard? • Oodles and oodles of complex features • Many protocols • Many mechanisms • Many configurable parameters • Little guidance for network administrators • How to select and compose features? • How to set the configurable parameters? • Managing boxes, rather than networks • Routers, switches, firewalls, IDSes, servers, etc. • Low-level, box-specific configuration languages

The Enemy is Complexity • Goal: raising the level of abstraction • Network-level design and configuration • Composition of protocols and mechanisms • Idea #1: add abstraction on top • Compile high-level spec into box configuration • But, must grapple with inherent complexity • Idea #2: design system for manageability • Identify network-level abstractions • … and change the boxes and protocols • But, must grapple with backwards compatibility

Example: Border Gateway Protocol • ASes exchange reachability information • IP prefix: block of destination IP addresses • AS path: sequence of ASes along the path • Configurable routing policies • Path selection (which route to use?) • Path export (who to tell about the route?) “12.34.158.0/24: path (7018,1,88)” “12.34.158.0/24: path (88)” 88 1 7018 data traffic data traffic 12.34.158.5

Too distributed Too indirect Some Things I Hate About BGP… • Routers in an AS have different views • Effect: protocol oscillation and loops • Point fix: testing sufficient conditions • Routing policy distributed across routers • Effect: routers need to share information • Point fix: complex “tagging” of BGP routes • Policy has only an indirect effect on traffic • Effect: selecting the right policy is hard • Point fix: “what if” tools for traffic engineering • BGP route selection depends on the IGP • Effect: disruptions from small internal changes • Point fix: “what if” tools to identify risks

Interdomain Routing: Design for Manageability • Routing Control Platform • Represents the AS to others • Has complete view of candidate routes • Computes answers for the AS’s routers • Communicates with other ASes • Using BGP or (ideally) a brand new protocol Inter-AS Protocol RCP RCP RCP AS 1 AS 2 AS 3 Physical peering

Advantages of RCP Approach • Lower management complexity • Complete, network-wide view • Direct control over the routers • Single specification of policies and objectives • Simpler routers • Much less control-plane software • Much less configuration state • Enabling innovation • New algorithms for selecting paths within an AS • New approaches to inter-AS routing

Deployability: Backwards Compatibility using BGP • Border Gateway Protocol (BGP) • Protocol: messages sent between routers • Decision logic: route-selection process • Policy: configurable rules for path selection/export • The key point is that BGP has • Complex decision logic and policies • Yet a simple protocol(and message format) • Use BGP messages to “program” the routers

Phase 1: Flexible Path Selection in One AS Before: conventional use of BGP in backbone network eBGP iBGP After: RCP learns routes and sends answers to routers eBGP RCP iBGP

Phase 2: AS-Wide Path Selection and Export Before: RCP gets “best” iBGP routes (and IGP feed) eBGP RCP iBGP After: RCP gets all eBGP routes from neighbors eBGP RCP iBGP

Phase 3: Direct Communication Between RCPs Before: RCP gets all eBGP routes from neighbors eBGP RCP iBGP After: ASes exchange routes via RCP Inter-AS Protocol RCP RCP RCP iBGP AS 1 AS 2 AS 3 Physical peering

Systems Considerations (NSDI’05) • Reliability • Problem: single point of failure • Solution: replication of RCP components • Consistency • Problem: inconsistent decisions by replicas • Solution: consistency without inter-replica protocol • Scalability • Problem: storing and computing for all routers • Solution: store each route once and amortize work

Example Network Management Applications • Customer-driven route selection • Customized load-balancing policies • Geographic rules for route selection • Blocking denial-of-service attacks • “Blackhole” routes that drop traffic • Only for routers carrying attack traffic • Hitless maintenance • Move traffic away from certain routers • Before the operators bring down the routers

Conclusion • Network management is too hard • IP was not designed for management • Complex, distributed operation of routers • Must reduce complexity • Network-wide views and objectives • Direct control over the data plane • RCP approach is feasible • Deployable, scalable, and reliable • Solves important management problems • Many interesting open problems

Backup Slides

Routing Control Platform (RCP) Routing Control Platform (RCP) Route Control Server (RCS) Options Answers Topology OSPF Viewer BGP Engine … BGP updates OSPF link-state advertisements BGP updates … … Network

Scalability: Standard Computing Platform • Prototype on a high-end PC • 3.2 GHz Pentium-4 with 8 GB of RAM • Running the Linux 2.6.5 kernel • Workload from the AT&T backbone • Replay the BGP and OSPF messages • Good RCP performance • Memory usage: less than 2GB • Speed, BGP changes: less than 40 msec • Speed, topology changes: 0.1-0.8 seconds Short answer: the system can keep up

Reliability: Replication and Consistency • Replication: avoid single point of failure • Multiple RCPs in a network • Connected at different places • Consistency: no explicit coordination • Replica has full view of each partition • Replicas perform the same algorithm on the same data, and get the same answer A, B A B RCP A RCP B

Evolving Toward a Self-Managing Network

Evolving Toward a Self-Managing Network

Presentation Transcript

Toward a Theory of Economic Self Reliance ESR

Toward Self-directed Intrusion Detection

Evolving and Self-Managing Data Integration Systems

Toward Becoming a Self-Regulated Learner

Managing a Statewide Network

SELF MANAGING TASKS

Managing a Secure Network

Approaches Toward Managing Demand Risk

Managing a network

Managing Self

Managing Self

Moving Youth toward Self-sufficiency

Toward Self-Stabilizing Operating Systems

Managing a Network

Dynamic Aggregation in a Model with Heterogeneous Interacting Agents in a Self-Evolving Network

Lesson 10-Managing a Network

Self-Managing Health

Organizational approaches toward managing stress

Managing Self

Toward Self-directed Intrusion Detection

SELF EVOLVING INSTITUTIONS