Large IP Network Architecture Design SYSC 4700 Winter 2014

Large IP Network ArchitectureDesignSYSC 4700 Winter 2014 Andrew Brown, CCIE# 4234 (Emeritus) , PMP Manager, Test Engineering Service Provider Routing Technology Group – Broadband / Service Provider WiFi Cisco browna@cisco.com CCO Home Page: http://www.cisco.com/en/US/products/ps9343/index.html

Speaker Bio: Andrew Brown Andrew Brown received his B.Eng in Electrical Engineering from Carleton University in 1992 and was also a student in 94.470. He holds the Cisco Certified Internetworking Expert and Project Management Professional certifications. Since 1992 Andrew has been a member of the network engineering teams at the Department of Foreign Affairs and International Trade Canada, iStar Internet and PSINet. He joined Cisco's product development organization in 1999. Since then Andrew has been with Cisco managing various software test teams in Kanata developing Cisco hardware and software for worldwide internet service provider customers . He lives in Kanata with his wife Eva and three children.

Agenda • IP Network Design Goals • IP Network Design Principles • Internet Routing

What Is The Problem? Enterprise applications must be available and perform well Networks that deliver on this requirement: • Have consistently high-performance packet forwarding • Are reliable and available • Are service enabling • Are cost efficient

Network Design Goals • Reliable & Resilient (High Network Availability) • Services & Applications – Guarantee Access • Network Performance Levels (QOS) - Consistency • Single Point’s of Failure - Eliminate • When Failures do happen: • Automate Recovery • Minimize overall impact (Hierarchical Design) • Flexible & Scalable • - Expected & Unexpected network traffic patterns • - Minimize – sustaining cost / effort • - Maximize - network MTBF

Network Design Goals • Operations, Administration & Management (OAM) • Easily maintainable - by NOC staff • Network Monitoring and Control - Simplify • Service Provisioning - Simplify • POP Architecture – Standardize (think LEGO) • Cost Efficient • Use what you need, where you need it - no more, no less • Open Standards (IEEE, IETF, DSL Forum…) • Build with standards based protocols • multi-vendor solution "It should be as simple as possible but no simpler"-- Albert Einstein

Hierarchy Structure Modularity Is functional and divides the problem Creates failure domain boundaries Create manageable building blocks Hierarchy, Structure and Modularity Fundamentally, we are breaking the network design process into manageable blocks so that the network will function within the required performance and scale limits of applications, protocols and network services

Network Topology • A well designed Topology is the fundamental basis for all stable networks • Key Principles: • The area affected by a topology change should be minimized • Routers should carry a minimum amount of routing information • Goal is Fast Network Convergence

What is Network Convergence ? • Network convergence: • Time needed for traffic to be rerouted to the alternative or more optimal path after the network event • Requires all affected routers to process the event and update their route forwarding tables • Network Convergence: 4 Step Process - • Detect network event has occurred • Propagate the event to all impacted nodes • Process the event on all impacted nodes • Update related forwarding structures

TOR OTT VIC CAL VAN EDM HFX MTL Topologies - Point to Point “Full Mesh” Characteristics: • Separate connections between nodes • No hierarchy • Limited scalability • Layer 2 based thinking • Not exploiting benefit of multiplexing • N(N-1)/2 circuits !! • Problem = Increased routing complexity

Topologies – Point to Point “Partial Mesh” Characteristics: GE/10GE, POS • Each router is connected to two or more routers • Hierarchical • Somewhat Scalable • Uses less interfaces compared to full mesh • Cost effective $$ • Routing Complexity can be controlled Distribution Routers Aggregation Routers

Topologies - Switched Characteristics: GE/10GE, ATM, POS • Each router is connected to one or more switches • Switch = failure point • Somewhat Hierarchical • More Scalable than full/partial mesh • Cost effective $$ • Routing Complexity can be controlled Distribution Routers Aggregation Routers

Topologies - Ring Characteristics: Core/Distribution Routers • “Fat data pipe” on ring between nodes • No single point of failure • Scalable - one interface per router !! • Logical layer 3 full mesh • Hierarchical as you want • Cost optimal - circuits follow best route $$ • Routing complexity can be simple to complex RPR / GE Aggregation Routers

Hierarchical Network Design Model - Each Layer Provides a unique function Core layer provides optimal transport between core routers and distribution sites Distribution layer provides policy-based connectivity, peer reduction, and aggregation Service Provider WAN Access layer provides common group access to the internetworking environment

Core (Backbone) Functions • Provide Transport Bandwidth • Path Optimization for Fast Convergence • Path Redundancy - No Single Points of Failure • Full Routing Reachability Information • Traffic Differentiation by “Traffic Class” • IP Precedence/DSCP, Queuing, Congestion Control (WRED) • Load balancing across links • No Policy Enforcement

Distribution (Aggregation ) Functions • Traffic Aggregation (Multiplexing) • Topology Change Isolation • Backbone Traffic Management • IP Precedence/DSCP, Queuing, Congestion Control (WRED) • Policy Enforcement (Filtering / QOS) • Security ACLs, Traffic coloring, policing, shaping , rate-limiting.. • Control Route Table Size • Primary Strategies Used: • Route Re-distribution/Summarization • Minimize Core to Distribution Layer Connections

Access Functions • User Connection to Network • leased line, DSL, Cable, Wireless, EPON... • Access Control to Network • Access Control Lists (Traffic Filters) • “Edge” Services / Deep Packet Inspection • Packet Classification • Tunnel Termination (Virtual Private Networks) • Encryption • Traffic Metering and Accounting • Access Security

Create Redundancy • Problem: • Hierarchical Network Design Creates Single Points of Failure ! • Redundancy Compensates for this Weakness • Different Strategies for Core, Distribution and Access • Competing Goals - • Maximize – Number of Failures network can survive • Minimize – Latency / Delay • Minimize - Network Paths / Routes

Core Layer Redundancy Design 1/2 • Ring Core Design • Two Paths to any Destination on the ring • Normal Operation – Packet follows minimum hops • Single Link Failure - Increases Hop Count • Dual Link Failure - Creates “network islands”

Core Layer Redundancy Design 2/2 • Full-Mesh Core Design • Most Redundant Topology • Large Number of Alternate Paths • Potential Convergence/Scale Problems • Normal Operation - one IP Hop • Link Failure - Increased Hop Count • Partial Mesh Core Design • Compromise Hop Count & Redundancy • Some Routing Protocols Don’t Handle Point-to-Multipoint Designs Well (ie OSPF)

Distribution Layer Redundancy Design Core • Main Concern: • Unexpected traffic patterns/routing • Dual Homing to the Core • Doubles Number of Paths in the Core • Backup Link Between Distribution Routers • Distribution routers could prefer redundant link to Core ! Distribution Access Core Distribution Access

Access Layer Redundancy Design Core • Main Concerns: • Control Network Paths • Core Routing Table Size • Dual Homing to Distribution Layer • Core route table explosion ! • Rule: Do Not Advertise Redundant Links as Normal Paths • Interconnect Access Routers • - Redundant Link Should be able to Handle Traffic of Both Access Sites to the Core Distribution Access Core Distribution Access

I Can Route to the 172.16.0.0/16 Network A B IP Route Summarization ! 172.16.25.0/24 172.16.26.0/24 A Routing Table 172.16.25.0/24 172.16.26.0/24 172.16.27.0/24 B Routing Table 172.16.0.0/16 172.16.27.0/24 • Achieves two main Goals: • Controls size of route table • Localizes topology change information • ….. reduces network convergence times • ….. increases network stability

Where to Summarize IP Routes ? Core (Backbone) Distribution (Aggregation) • Only provide topology change information where needed • Distribution layer is key: • Summarize to Core • Send Default to Access Access (Edge)

Redundancy Design – Network Failure Mode Analysis “Peer” Providers Business Service Provider PoP DSL, cable, dial Internet Residential Access Layer Distribution Layer Core Layer

Redundancy “Inside the Router “ !! • Standby Route Processor (RP) takes control of router after a hardware or software fault on the Active RP • SSO - standby RP to take immediate control and maintain connectivity protocols (layer 2) • NSF – standby RP continues to forward packets until route convergence is complete (layer 3) State Information Active RP Standby RP Line Card Line Card

Internet Routing Challenges • Scalability(IGP routing does not scale) • Stability (100s of thousands of routes ) • “Policy” routing is key… • How do you prevent carrying competitor's traffic ? • How do you control where your traffic is sent ?

Interior vs. Exterior Routing Protocols • Interior Protocols (ie RIP/OSPF) • Peer discovery > automatic • IGP peers > “trusted” • Routes distribution > all IGP peers • Route updates > periodically flooded • Exterior Protocols (ie BGP) • Peer discovery > configured • BGP peers > “un-trusted” outside networks • Route distribution > based on configured “policies” • Route updates > on demand

Internet Routing = BGPIGP routing in each AS is hidden to outside world ISP to ISP Peering C A ISP A AS 1 ISP B AS 2 D B ISP to Customer Peering E ISP C AS 3 F Enterprise A AS 10 • Thousands of Routes • Many autonomous networks • Single IGP can NOT handle this ! G

BGP Summary of Operation Peering • BGP peers connect over TCP (port 179) • Peers exchange messages to open peering connection • Initial BGP route table exchange • Incremental BGP updates ongoing • Keepalive messages between BGP peers ongoing A B AS 100 AS 101

BGP Incremental Routing Updates • Once BGP sends a route to a peer, it assumes the peer will keep it unless: • A replacement route is sent—implicit withdrawof old route • The route is withdrawn—explicit withdraw • The BGP session goes down (keepalive failure)

Single Home NetworksMost common configuration on Internet !! Internet AS 200 (ISP) • Single exit point, single ISP - No need for BGP • Customer points static default to ISP • ISP advertises stub network • Routing policy confined within upstream ISP’s policy AS 100 (Customer) 192.100.50.0/24

Multi-Homed Networks – why ? Maximal Redundancy & Reliability One connection to Internet means you are dependent on: Local router (configuration, software, hardware) WAN media (physical failure, carrier failure) Upstream ISP (configuration, software, hardware) Enterprise applications demand continuous availability Downtime = lost $$

Multi-Homed Networks – Watch out ! Using multiple providers does not guarantee circuit diversity ISP’s often cross common facilities Mutual fate sharing is still an issue The Internet Provider 1 Provider 2 Single CO*, Multiple Racks Single Fiber, Multiple Wavelengths 192.100.50.0/24 *CO: Central Office *ILEC: Incumbant Local Exchange Carrier *LEC: Local Exchange Carrier http://en.wikipedia.org/wiki/Local_exchange_carrier

Multi-Homed Network Scenarios • Scenario #1 - Single ISP router • Scenario #2 - Multiple ISP routers • Scenario #3 - Multiple ISP / Customer Routers • Scenario #4 - Multiple ISPs

Scenario #1 - Single ISP router • Outbound routing - • use default route • Inbound routing - • Option 1: ISP can use static routes or IGP to learn your routes and load share • Option 2: Can use BGP to load share with private AS peering ISP AS 201

Scenario #2 - Multiple ISP routers ISP • Outbound Routing - • Use two equal cost defaults to reach ISP • Inbound Routing – • Same as Scenario #1 D F 0.0.0.0 0.0.0.0 A AS 201

Scenario #3 - Multiple ISP/Customer Routers ISP • Outbound Routing - Use two defaults • “Watershed effect” • Inbound routing – Same as Scenario #1 D F Inject Default 1 0.0.0.0 Inject Default 2 0.0.0.0 A B AS 201

D E Scenario #4 – Multiple ISPs Tier 1 ISP AS 4 Tier 1 ISP AS 5 AS 6 Tier 1 ISP AS3 Tier 2 ISP AS 2 A B AS 201 C

RFC 3439: Key Internet Architecture Design Principles • Simplicity Principle – Complexity is the primary source of inefficient scaling and increased capital and operational expenses (CAPEX / OPEX) • Implication = We must drive our architectures and designs toward the simplest possible solutions • “end-to-end protocol design” should not rely on maintenance of state inside the network…. The complexity of internet belongs at the Edge (AKA Distribution layer)

RFC 3439: Key Internet Architecture Design Principles • Amplification Principle – There are non-linearity's that occur at scale which do not occur at small to medium scale • Implication = In large IP network even small things can and do cause huge network events • It has been shown that increasing BGP inter-connectivity results in more complex and slower BGP routing convergence • Ensure local network changes only have local effect – ISOLATE FAILURE DOMAINS

RFC 3439: Key Internet Architecture Design Principles • Coupling Principle – As things get larger they often exhibit increased interdependence between components (AKA unforeseen feature interaction) • Implication = The more network events that simultaneously occur the larger the likelihood that they will interact in strange and unexpected ways • Coupling is intimately related to routing synchronization and network convergence • Much of non-linearity effects in networks coupling (Amplification Principle) are also related to coupling • Keep it as simple as possible – only add new features / protocols when essential • Minimize protocol layering… IP over WDM vs IP over ATM over SONET over DWM etc

Suggested Readings: Ciscopress.com Internet Routing Architectures Advanced IP Network Design Large-Scale IP Network Solutions Building Resilient IP Networks • Cisco.com • BGP Cases Studies: http://www.cisco.com/warp/public/459/18.html RFCs • Some Internet Architectural Guidelines and Philosophy (RFC 3439) • Architectural Principles of the Internet (RFC 1958)

FINAL THOUGHTS & ADVICE • NEVER STOP LEARNING • FOCUS ON DEVELOPING YOUR SOFT SKILLS NOT JUST HARD TECHNICAL SKILLS • SEEK OUT MENTORS & SURROUND YOURSELF WITH PEOPLE SMARTER THAN YOU • REFLECT OFTEN - AM I ON THE RIGHT LADDER ? AM I FOCUSED ON THE MOST IMPORTANT THINGS ? THE FIRST 3 BOOKS YOU SHOULD READ AFTER UNIVERSITY: • Getting Things Done - David Allen • 7 Habits of Highly Successful People - Steven Covey • Emotional Intelligence at Work - Daniel Goleman GOOD LUCK IN YOUR CAREERS !!

Supplemental Notes / References • IP Address Planning • How to Select an IP Routing Protocol • IP Multicast Overview • Internet QOS Overview

IP Address Planning - IP Addressing Design • Goal is Network Stability • Address Allocation Generally Considered an Administrative Function ! • Addresses Difficult to Change After Assigned (DHCP helps) • Poor Addressing Contributes to Almost all Large IP Network Failures • Routing Stability Dependent on # Routes Propagated in the Network and # Network Changes • # Routes -> Summarization -> Addressing Stability Summarization Addressing Topology

IP Address Planning - Address Allocation Strategies • First Come First Served • Don’t do this ! • Politically (ie by department) • Doesn't Scale Well ! • Geographically (ie by region) • Some gains but some routes will not be summarized - fragmentation • Topologically • Most Effective for Network Stability • Assign Addresses based Router Network is Attached to

IP Address Planning - Managing Address Space Depletion • Problem: Available IPv4 address space is disappearing quickly! • Solutions: • Variable Length Subnet Masking (VLSM) • Classless address allocation - (CIDR RFC 1517-1520) • Internet Address Registry (ARIN) • Network Address Translation (NAT - RFC 1631) • Use of private address space (RFC 1918) • IPv6 Deployment

Large IP Network Architecture Design SYSC 4700 Winter 2014