1 / 20

What Lies Beneath: Understanding Internet Congestion

What Lies Beneath: Understanding Internet Congestion. Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems. http://networks.cs.northwestern.edu. Common Wisdom and Our Key Results. No congestion in the Internet core

lee-rocha
Télécharger la présentation

What Lies Beneath: Understanding Internet Congestion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems http://networks.cs.northwestern.edu

  2. Common Wisdom and Our Key Results • No congestion in the Internet core • Links are over-provisioned, hence no congestion • No correlation among congestion events in the Internet • Diversity of traffic and links make large and long-lasting link congestion dependence unlikely • Our key results • There is a subset of links (both inter-AS and intra-AS) that exhibit strong congestion intensity • Congestion events in the core can be highly correlated (up to 3 ASes)

  3. Why Do We Care? • Congestion in the core • Can depend on upon internal network policies or complex inter-AS relationships • Variable queuing delay can lead to jitter, affecting VoIP or streaming applications • Correlation • Guidelines for re-routing systems • Most tomography models assume link congestion independence

  4. Challenges • Scalability • How to concurrently monitor a large number of Internet links? • Need a light monitoring tool • Need a triggered monitoring system • Our approach • Pong: a light monitoring tool • Per-path overhead 18 kbps • TPong: a triggered monitoring system • Capable of monitoring up to 8,000 links concurrently

  5. Congestion Events • Congestion Intensity • How frequently does queue build-ups happen over 30 seconds time scales? • We focus on persistent congestion events: • Intensity > 5%; duration > 2 minutes

  6. f s d b Coordinated Probing Probe S D f probe b probe , s probe d probe , , 4-p probing: a symmetric path scenario Combines e2e and router-targeted probing

  7. f s d b Locating Congestion Points Tracing Congestion Status Half-path queuing delay Pong: Coordinated Probing Probe Δf Δd S D Δs Δb Δfs Δfd

  8. Pong: Methodology Highlights • Coordinated probing • Send 4, 3, or 2 packets from two endpoints • Quality of Measurability (QoM) • Able to deterministically detect its own inaccuracy • Self-adaptivity • Switch among different probing schemes based on QoM and path properties

  9. Vantage Point Selection Problem • How to select vantage points to accurately measure congestion at a given link? • Link measurability score • How well are we able to measure a specific link from a specific pair of endpoints; a function of: • Quality of measurability (QoM) for a given node • Queuing-delay threshold quality • Observability score • Avoid paths that “see” multiple congested links concurrently

  10. Triggered Monitoring System • Greedy algorithm to determine a subset of links • Covered 65% (7,800) links with 4.9% (1,750) paths • Limit the per-node measurement overhead • Priority-based Pong path allocation • Maximize quality of measurability

  11. Coverage & Overhead Statistics • We observe ~ 36,000 paths • N^2, N = 191 nodes • Expose ~ 12,100 links at a time • Due to routing changes, we are able to observe ~ 29,000 links in total • TMon paths: • Up to 2,000 paths running fast-rate probing concurrently • Cover up to 8,000 links concurrently • 4.9% paths cover 65% of total links • Pong paths • Up to 30 Pong paths; cover up to 350 links concurrently • Overhead per node: • Average: 30 kbps, Peak: 68 kbps

  12. Measurement Quality • How good is our vantage-point selection algorithm? • Link Measurability Score: 0-6. • 65% of measurement samples have non-zero score • 80% of measurements is better than fair • 60% of measurements is better than good • The key point is that we know how good or bad we are doing

  13. Key Findings • Time-invariant hot spots • Strong spatial correlation among congested links • Root-cause analysis

  14. Time-invariant Hot Spots • Time-of-day effects for the number of congestion events • Small number of links show strong time-invariant congestion intensity

  15. Time-invariant Hot Spots • Most of the links are not inter-continental links as we initially hypothesized • Inter-AS links between large backbone networks as well as intra-AS links within these networks

  16. Congestion Correlation • Pair-wise correlation • Percent of time 2 links are concurrently congested • Pair-wise correlation can be quite extensive • E.g., 20% of pairs has correlation greater than 0.7 • Correlation: weekend > weekdays • Overall congestion level smaller during weekends • Distance between correlated link pairs • up to 3 ASes

  17. Aggregation Effect Hypothesis • Hypothesis: • When upstream traffic converges to a relatively thin aggregation point, then traffic surges in an upstream link are likely to create congestion at a thin downstream aggregation link • Insights: • Aggregation points correspond to time-invariant hot spots • Interaction between an aggregation point and an upstream link causes link-level correlation Aggregation link

  18. Root-cause Analysis: Example 622Mbps 10Gbps

  19. Final Statistics Table 1: Matched locations in the top ten networks defined by the number of peers Europe North America Asia Table 2: Matched locations in the top ten ISPs that most aggressively promote customer access

  20. Conclusions • Triggered monitoring system • Measuring congestion in a scalable way • Key feature: • Select vantage points to measure congestion as a function of the measurement quality • Key findings • A subset of links experience time-invariant high congestion intensity • There is strong correlation among congestion events at different links (up to 3 ASes) • Root cause: aggregation effect • some links thinner than others

More Related