100 likes | 175 Vues
Intro to Backpressure and Service Discipline Effects on CM. Mitch Gusat, 6 Sept. 2006 IBM Zurich Research Lab. Analytical Method. Method used in this presentation explicit assumptions simple traffic scenario reduced MIN, with static/deterministic routing (fixed) This ‘model’ considers
 
                
                E N D
Intro to Backpressure and Service Discipline Effects on CM Mitch Gusat, 6 Sept. 2006 IBM Zurich Research Lab
Analytical Method • Method used in this presentation • explicit assumptions • simple traffic scenario • reduced MIN, with static/deterministic routing (fixed) • This ‘model’ considers • queuing – in Eth. Channel Adapter (ECA) and switch element (SE) • scheduling – in ECA and SE • Ethernet’s per-prio PAUSE-based LL-FC (aka backpressure - BP) • reactive CM a la BCN • Linearization around steady-state => tractable static analysis • salient transients will be mentioned, but not computed • Compute the cumulative effects of • scheduling, • LL-FC backpressure per prio (only one used here), • CM source throttling (rate adjustment) • Do not compute the formulas for • blocking probability per stage and SE • variance of service time distribution • To be done: Validate results against simulations
Model and Traffic assumptions • Traffic = ∑(background + hot) “A total of 50% of link rate is attempted from 9 queues ( 8 background + 1 hot) from each ECA.” • Bgnd traffic: • 8 queue/ECA on the left. Each of the 8 queues is connected to one of the 8 ECAs on the right. => 64 flows (8 queue/ECA x 8) on the left that are each injecting packets. “80% of these [total link rate] are background, that is 80%x50% = 40% of link rate.” => background traffic intensity λ=0.4 is uniformly space-distributed • Hot traffic: “20% of these are hot, so hot traffic is 20%x50% = 10% of link rate.” .4 +.1 .4 .4 .4 +.2 .4 +.1 .4 .4 .4 +.4 .4 +.1 .4 .4 .4 +.2 .4 +.1 .4 .4 .4 +.8 .4 +.1 .4 .4 .4 +.2 .4 +.1 .4 .4 .4 +.4 .4 +.1 .4 .4 .4 +.2 .4 +.1 .4 .4 .4
120% Link Load => 20% Overload - What Happens Next? S1 S2 S3 • Hotspot arrival intensity: λbgnd + λhot= .4 + .8 = 1.2 > 1 => Overload , [mild] congestion factor = 1.2 @ SE (L2,S3) ...next ? • BP andCM will react • if SE(L2,S3) is work-conserving, 0.2 overload must be losslesy squelched by CM and BP • The exact sequence depends on the actual traffic, SE architecture and threshold settings. • Irrelevant for static analysis, albeit important in operation • Separation of concerns -> Study the independent effects of BP (1st) and CM (2nd) • iff linear system in steady-state -> superposition allows to compose the effects .4 +.1 .4 .4 .4 +.2 .4 +.1 .4 .4 .4 L1 +.4 .4 +.1 .4 .4 .4 +.2 cf = 1.2 .4 +.1 .4 .4 BP CM .4 +.6 L2 .4 +.1 .4 .4 .4 +.2 .4 +.1 .4 .4 .4 L3 +.4 .4 +.1 .4 .4 .4 +.2 .4 +.1 .4 .4 .4 L4
Link-Level FC will Back-Pressure: Whom? How Much? Whose 1st? Stop2 ? Stop1 ? • Depends on the SE’s service discipline • Most well-understood and used disciplines • Round-Robin RR versions: strict (non-WC) and work-conserving (skip invalid queues) • FIFO, aka FCFS, aka EDF (timestamps, aging) • Fair Queuing, WRR, WFQ • A future 802.3x should standardize only the LL-FC not its ‘fairness’ bgnd + hot’ = .4 + .4 .4 + .4 = .8 .4 hot” = .4 Buffers fill up bgnd + hot’ = .4 + .4 .4 + .4 = .8 1.2 .8 + .4 = 1.2 > 1
EDF-based BP: FCFS-type of Fairness (subset of max-min) S1 S2 S3 • New TX rates EDF-fair are backpropagated λ’ = (1 - θ) * λ = 0.834 * λ θ = 1- μj / (∑λij) , incremental upstream traversal rooted on SE (L2,S3) Hint: subtract the bgnd traffic λ = .4from the EDF-fair rates and compare w/ previous hot rates Obs.: If moderate-to-severe congestion θ->1 => λ’ -> 0 : Blocking spreads across all ingress branches => neither parking lot ‘unfairness’ nor flow decoupling is possible. (wide canopy saturation tree) * All flows sharing resources along the hot paths are backpressured proportional to their respective contribution (not their traffic class). No flow isolation. .4 +.1 .4 +.2 .4 +.1 BP .4 L1 +.4 .4 +.1 .4 .734 +.2 .4 +.1 BP 1.0 L2 .4 +.1 .4 .566 +.2 .666 .4 +.1 BP .4 L3 +.4 .417 .4 +.1 .4 .5 +.2 BP .4 +.1 L4 .483
RR-based BP: Prop. Fairness – Selective and Drastic S1 S2 S3 • New TX rates RR-fair are iteratively computed and backpropagated • 1. identify the INs exceeding RR quota, as members of N’ ≤ N • 2. distribute the overload δacross N’ • δij’ = N*λij - μj / (N*N’), δij’≤ δfor work-conserving service • 3. recompute the new admissible arrival rates λij’ = λij - δij’incrementally, upstream traversal rooted on SE (L2,S3) • 3’. If strict RR no longer δij’≤ δ => the BP effects are drastic and focused! Hint: subtract the bgdn traffic λ = .4from the RR-fair rates and compare w/ previous hot rates Obs. 1: Only the selected branch is BP-ed (discrimination) => RR-BP blocking always discriminates between ingress branches. Obs. 2: If severe congestion and/or many hops, selected branches will be swiftly choked down (bonsai – narrow trees). .4 +.1 .4 +.2 .4 +.1 BP .4 L1 +.4 .4 +.1 .4 .8 +.2 .4 +.1 BP 1.0 L2 .4 +.1 .4 .6 +.2 .6 / .5 .4 +.1 BP .4 L3 +.4 .3 .4 +.1 .4 .4 / .25 +.2 / .15 BP .4 +.1 L4 .5
20% Overload - Reaction According to CM • What’s the effect of CM only, if no LL-FC BP? • Congestion factor cf=1.2 : • 1. Marking by SE(L2, S3) • is done at flow resolution (queue connection here) • is based on SE queue occupancy and a set of thresholds (single one here, @8) • if fair w/ p=1%, BCN marking is pro-rated 33% (bgnd) + 67% (hot) • 2. ECA sources adapt their injection rate • per e2e flow • Desired result: convergence to proportionally fair stable rates λbgnd + λCM_hot= O(.33 + .67) - achievable via fair marking by SE • Achieved... (see Jinjing Jiang and Raj Jain proof at SD’06)
20% Overload - Reaction According to LL-FC Strictly depending on the service discipline 802 shouldn’t mandate scheduling to switch vendors, because • Round-Robin (RR: strict, or, work-conserving) • strong/prop. fairness • decouples flows • simple & scalable • globally unfair (parking lot problem) • FIFO/EDF (timestamps) • temporally & globally fair: first-come-first-served • locally unfair => flow coupling (can’t isolate across partitions and clients) • complex to scale • BP will impact the speed, strength and locality (fairness) of backpressure... (underlying CM) • hence different behaviors of the CM loop
Conclusions • 1. We have two reactive loops coupled: BP (fast) + CM (slow) • BP/LL-FC modulates CM’s convergence: +/- phase and amplitude depends on topology, RTTs, traffic and SE • 2. LL-FC will cause BP avalanches, aka saturation trees • incidence grows w/ topology and number of hops • major issue for MINs and k-ary n-cube nets • 3. SE’s service discipline impacts BP => CM • 4.CM should trigger earlier than BP => the two mechanisms, albeit ‘independent’ should be codesigned and co-tuned. • We must analyze CM w/ an array of BP and scheduling schemes, or, start ASAP an .3x follow-up effort “Lossless LL-FC”... That’s all...