Globally Optimal Distributed Batch Reconfiguration for Hazard-free Dynamic Provisioning:

Globally Optimal Distributed Batch Reconfiguration for Hazard-free Dynamic Provisioning: How an Entire Network can “Think Globally and Act Locally” Wayne D. Grover grover@trlabs.ca University of Alberta and TRLabs Edmonton, AB, Canada DRCN 2007 La Rochelle, France, Oct. 70-10

Setting the stage..what motivates this proposal? • US National Science Foundation: • Calls for “completely new approaches to network operations.” • Identifies robust networking as one of the “grand challenges” in networking science • Concern that existing peer-to-peer asynchronous distributed provisioning scheme has the risks of network state incoherence • E.g. [1] Pandi & Wosinska, ICTON-RONEXT 2005 • Separately, in other industries, there is a move to exploring the applications and benefits of “on-line O.R.” • Existing distributed provisioning schemes can only employ greedy solution methods • What if a whole network could “think globally, but act locally”? • Greater resource efficiencies, greatly reduced signalling, hazard-free operation, continual near-optimality (“self consolidating”)

What Problem(s) are we trying to solve in dynamic protected service provisioning? • The inherent risk of schemes that operate dynamically, on the per-connection timescale, assuming global state coherency at all times. • Risky ! • High signalling volumes • Rather than trying to quantity and lower the risk: is there some approach that fundamentally avoids the risk in the first place? • In existing concepts provisioning is per-path with no chance to globally optimize • Periodic re-optimization of overall network configuration is awkward or unaddressed.

Overview • What is the problem? • Review key aspects of current dynamic provisioning concept • Key Concepts of New Proposal • Outline of Operation • Sub-study: Benefits of Batch Incremental Re-optimization problem • Sample results • Summary of Advantages and Disadvantages • Research Directions

Establish a protected connection to node 11 0 1 2 3 4 Spare capacity sharing 5 7 8 Establish a protected connection to node 2 9 6 10 11 12 SBPP Dynamic Protected Service Provisioning Concept(1)

I want to establish a protected connection to node 11 0 1 LSA PATH LSA 2 RESV PATH LSA 3 4 RESV LSA PATH 4. Flood LSA messages RESV 5 PATH 7 8 RESV PATH 9 RESV 6 10 PATH RESV PATH 11 PATH 12 RESV RESV LSA: Link State Advertisement SBPP Dynamic Protected Service Provisioning Concept(2) SBPP route computation and signaling process 1. Compute working and protection routes Working: 0-4-8-11; Protection: 0-3-7-9-12-11 2. Establish working path 3. Establish protection path

Observations / Concerns about Dynamic SBPP • Every node needs and assumes a complete and current network state database, and existing current protection capacity sharing relationships • Link state updates are advertised on a per-connection basis • Link state updates are disseminated asynchronously by any node at the same time other nodes are relying and acting upon time critical state information. • The total database of network state that is operationally “critical” grows at least as O(n3) with size of the network or operating domain and also intensifies with frequency of changes in the network.

Alternatives for Dynamic Automated Provisioning • Centralized Control: Global view, one operation at a time. • Safe (in the present regard) but other downsides • Apply packet priorities to update messages, use TE summary packets, etc. • i.e., measures to try to just mitigate the risk • Will eventually crash when provisioning is dynamic enough • “Protected Working Capacity Envelope” Concept • Removes protection arrangements from the per-connection time scale Refs: Grover- Comm Mag, Shen & Grover, Shen PhD; available at www.ece.ualberta.ca/~grover • Proposal: “Globally Optimal” Distributed Synchronous Batch Re-optimization • Eliminates the hazard of database incoherence • Framework yields other advantages

Key Concepts of New Proposal • Nodes in these networks have “precise time” ! Can we exploit that? • YES: Time synchronization can help in data synchronization • “Small-batch incremental reoptimization” provisioning • not path-by-path instantaneous asynchronous provisioning • Globally synchronous change actions, not asynchronous actions • Reliance on “precise time” to coordinate actions and decisions. • Relegating all operationally critical signaling for state update to non- real-time communication requirements • Robust confirmation of global state database coherence before any reliance upon it for network actions • Solving a globally optimal reconfiguration solution • But nodes act locally to put into effect their parts only of globally optimal reconfiguration plans.

How it works: Operational phases: • Batch Change Accumulation • Change Dissemination and Confirmation • Globally Optimal Reconfiguration Solution • Local change activation

Step 1. Batch Change Accumulation • 5 to 10 minute interval envisaged • More generally, the period is relative to the connection holding time and request rate (I.e. provisioning traffic intensity) • Nodes make no changes to network connection state during this time • Nodes observe: • New requests • Departures (released connections) as they arise at their location only. • At end of the period nodes emit a change summary packet • Like an existing LSA, but contains batch change info • Robust error detection / correction encoded on packet • This dissemination isnot real-time-critical

Step 2. Change Dissemination and Confirmation • (Again, no change is made to network state made in this phase) • Nodes receive “batch change” summary packets from each other. • SLA-like forwarding as in OSPF (Internet) • May include pre-arranged scheduled service requests • This data exchange is not real-time critical • This process overlaps with the next change Accumulation phase • Nodes integrate change packets received into single network-wide re-provisioning summary view of the new requirements. • Each node then emits a global change summary checksum • Each node wait until an intermediate time mark: If every “heard” checksum matches own: proceed, • Else: flood out “wave-off: go around” signal Partial change list Checksum of integrated overall network change list Partial change list Checksum of integrated overall network change list Partial change list

Step 3. “Thinking Globally”: Optimal Reconfiguration Solution • Each node locally solves an instance of the globally optimal reconfiguration problem • May be any problem version network operator prefers • Example: Route and protect maximum number of the new service requests • While reclaiming capacity from released connections • With or without permission to re-optimize existing protection other Variants: • Multiple priorities or protection classes (multi-QoP) • Permission to re-arrange selected working paths • Strategies to include hedging against future uncertainty • Impairment aware, availability aware routing, etc. • Nodal solutions have to be “identical not just equivalent” • Prospect here for true “on-line O.R.” • any reduced complexity versionof the optimal problem can also be substituted here

Step 4.“Acting Locally”: Node do their part of the solution (only) • On the next globally precise-time mark: • Each node activates the switching matrix changes to put into effect its part (only) of the complete network reconfiguration solution. • No continuing existing connection is altered • Service access nodes observe the turn-up of their new connections and test end-to-end. • New operating phase commences • Change request accumulation continues Note that this results in creation of a complete set of new service paths and their protection arrangements simultaneously in parallel on the network with no signaling. Correctness of the outcome is independently validated by each end-node pair (as it would be in any case).

* * * Disseminate, n * Activate, n Compute, n Accumulate, n Disseminate, n+1 Activate, n+1 * Compute, n+1 Accumulate, n+1 * = precise-time instant Accumulate, n+2 time Network state only changes at these synchronous instants…they are like the clock edges in a digital logic circuit Overall Network Synchronous Phases

Sub-Study: Benefits of Optimal Batch Incremental Re-Provisioning (with Z. Pandi on COST 270 STSM to TRLabs) • Simulation of an “on-line O.R.” application for batch incremental re-optimization that is made possible by this framework. • Statistically non-stationary random traffic demand • i.e., not just random but spatially and temporally evolving random arrival / departure traffic • Tests / illustrates ability for scheme to inherently track and re-optimize for time-evolving demand patterns • Each node accumulates batch change info • At end of each batch period, globally optimal incremental reconfiguration problem solved (on a single CPU) • Global changes put into effect locally in simulated network • Compared performance against asynchronous independent provisioning using best known SBPP provision algorithm

Simulation Details • SBPP protection principle vs. small batch incremental reoptimization. (AMPL Model is Appendix to the paper) • Spare capacity allocations re-optimized each interval as well as new and released working paths routed • Networks • Sparse topology • High degree topology • Scenarios • Stationary random • Temporal overload • Temporo-spatial N-S and E-W evolutions; • Accumulation intervals from 0.2 to 0.4 holding times

Test Networks Time-space non-stationary statistical evolution of demand pattern Original EU Model Sparse Version

Sample Performance Results Full topology, spatial evolution Batching interval = 0.4 mean holding time Full topology, general overload Sparse topology, spatial evolution Total number of blocking events Time (unit holding times)

Summary: Properties, Advantages, Disadvantages • (+) Eliminates the hazard of database incoherency under asynchronous operation • All critical state-exchange becomes non-time critical (+) • (+) Network enjoys the efficiency and adaptability of on-line continual global re-optimization of network state • (-?) New connections are provisioned in the next provisioning cycle, not “instantaneously.”( service activation delay) • Nodes synchronize their actions using existing network network time/frequency assets. • analogy to clocked logic circuit robustness • Service protection still acts at any time in response to actual failure, • provisioning cycle skipped so protection action is reflected in next change accumulation period.

Research Directions within this framework • Incremental re-optimization models and strategies that the framework enables • Options such as spare capacity re-optimization or not • Multi QoP classes, priorities • Working re-arrangeable service classes? • Maximum revenue, minimum load, etc: different objectives • Multi-QoP provisioning solutions • Incremental on-line grooming optimization • Different approaches to “identical not just equivalent” solution of the disparate instances of the same global optimization problem. • Links to the “scheduled lightpath” connection planning problem and the “network consolidation problem.” • Accommodation for a top-priority no-delay service class • If thought essential • Extension to Domains rather than nodes • Links to the PWCE concept • Collaborations in this area already begun with B. Jaumard, Networks OR group at Concordia U., Montreal

Your Feedback and Questions are most welcomed Wayne Grover grover@ece.ualberta.ca

Extra slides Wayne D. Grover University of Alberta and TRLabs Edmonton, AB, Canada \

Protecting network Protection capacity=4 6 1 8 0 7 5 3 10 Initial network Total deployed capacity=16 16 16 0 0 0 4 4 4 16 16 2 2 2 16 16 16 5 5 5 16 1 1 1 3 3 3 Protecting Working network Working capacity=12 10 15 8 16 Protected services 9 11 13 6 Example: p-Cycle-Based Protected Working Capacity Envelope (PWCE) No per-connection protection path establishment No protection path signaling required for failure recovery

I want to establish a protected connection to node 11 0 1 2 3 4 5 RESV 7 8 PATH 9 6 10 RESV PATH 11 No LSA flooding unless the envelope capacity on the span is used up 12 RESV PATH PWCE-Operational Steps for Service Provisioning PWCE route computation and signaling process 1. Compute working route: 0-4-8-11 2. Establish working path

Key Ideas / Philosophy of PWCE • For protected services, “if you can route it (through the PWCE), it IS protected.” • PWCE does not disseminate link state information per connection, or any protection information during service provisioning. • PWCE provides observability on the approach to blocking, i.e., toward the edge of the operating envelope. • Onset of blocking under SBPP is less observable. • If demand pattern evolves, one can adapt the envelope by changing the partitioning of total capacity

Globally Optimal Distributed Batch Reconfiguration for Hazard-free Dynamic Provisioning: