Measurement in the Internet

Measurement in the Internet

Outline • Internet topology • Bandwidth estimation • Tomography • Workload characterization • Routing dynamics

Why study Internet topology? • General understanding of growth of Internet • Fragility/robustness to failures and attacks • Are there feasible design principles to: • improve robustness • reduce deployment/growth costs • make maintenance/support easier • improve performance for users/customers • Realistic input to simulators

"topology" - misleading word • Unlabelled graph links do not capture the problem. • BGP routing behavior is determined by policies, not just connectivity. • Peers, customers and providers are very different. • Bandwidth, latency, and congestion at the router level matters. • A small ISP peering link is not the same as a large ISP backbone link.

Scales/Hierarchies of topology • Routing/BGP connectivity of ASes or ARDs • What are the connectivity patterns between organizations? Are there cluster patterns? • Geographic/logical clusters within large organizations (particularly ISPs) • Router-level • Switches, hubs, firewalls, hosts

traceroute Tracing route to cider.caida.org [192.172.226.123] over a maximum of 30 hops: 1 <10 ms <10 ms 10 ms 172.16.0.254 2 <10 ms <10 ms <10 ms ntc-1-rsmx.rswitch.umn.edu [128.101.10.254] 3 40 ms 30 ms 90 ms ntc-1-rsmx.rswitch.umn.edu [192.168.100.22] 4 <10 ms <10 ms 10 ms tc3x.router.umn.edu [160.94.26.2] 5 <10 ms <10 ms 10 ms telecomb-52-g-0-2.router.umn.edu [160.94.26.114] 6 <10 ms <10 ms 10 ms telecomb-53-g-0-2.router.umn.edu [160.94.26.118] 7 <10 ms <10 ms 10 ms tc1-g-2-0.router.umn.edu [160.94.26.122] 8 <10 ms <10 ms 10 ms i2r-a-0-1-0-23.northernlights.gigapop.net [192.42.152.206] 9 30 ms 30 ms 30 ms abilene-mn.northernlights.gigapop.net [192.42.152.169] 10 30 ms 30 ms 40 ms kscyng-iplsng.abilene.ucaid.edu [198.32.8.81] 11 40 ms 40 ms 40 ms dnvrng-kscyng.abilene.ucaid.edu [198.32.8.13] 12 60 ms 70 ms 70 ms snvang-dnvrng.abilene.ucaid.edu [198.32.8.1] 13 80 ms 80 ms 80 ms losang-snvang.abilene.ucaid.edu [198.32.8.94] 14 70 ms 80 ms 80 ms hpr-lax-gsr1--abilene-LA-10ge.cenic.net [137.164.25.2] 15 80 ms 81 ms 80 ms sdg-hpr1--lax-hpr1-10ge.-l3.cenic.net [137.164.25.5] 16 80 ms 80 ms 80 ms hpr-sdsc-sdsc2--sdg-hpr-ge.cenic.net [137.164.27.54] 17 80 ms 80 ms 80 ms pinot.sdsc.edu [198.17.46.56] 18 * * * Request timed out.

How traceroute works • All IP packets have a Time-To-Live (TTL) field specifying the number of routerhops the packet is allowed to be in the network. • When an IP device (router or host) receives a packet: • if the packet is for the device, the device processes the packet • otherwise, decrement the TTL • if TTL > 0, forward packet towards destination • if TTL = 0, drop this data packet and send error packet back to source

How traceroute works • traceroutetries to measure the forward-path (one direction only) from source to destination • each router hop on the path is found one at a time • source sends a packet with TTL 1 and waits for an error from the router 1 hop away, use the source IP address of error as the identity of this hop • source repeats with larger TTLs until it reaches the destination (or gives up)

Router A AI AO AR How traceroute works • However, there are multiple potential choices for the IP address in the message from an intermediary hop. • Every interface on a router has a different IP address. • AI - input interface to A from source • AO - output interface towards destination from A • AR - return path interface towards source from A

traceroute to topology • Apply traceroute methodology from multiple sources to multiple destinations to discover links. • Number of sources and destinations necessary not clearly known. • There are diminishing returns of discovering new links, but not always clear if they are important or not. • We know that it is bad in some cases, but how bad is it?

traceroute and routers • traceroute only finds interface IP addresses, so we need a way to collapse those on the same router • load-balancing and non-atomicity can lead to false links

Big questions for topology • We know we can't see all backup and peering links. • How much might we really be off? • What set of possible "actual networks" could lead to what has been measured, and can we assign probabilities? • How much does it matter for different problems? • Are there ways of targetting measurement to improve coverage? • How do we understand the network with partial link characteristic or traffic information?

Why bandwidth estimation? • Not all link bandwidths and utilizations are the same. • Realistic inputs to simulators and models. • End hosts and routers may want to make intelligent decisions based on more knowledge about the network.

Bandwidth estimation • Capacity vs. available bandwidth • Network does not directly expose this information. • May be variable over short-time scales. • Cross-traffic can cause confusion. • Convolution of forward-path and return-paths in some techniques.

Bandwidth estimation • Link techniques try to find bandwidth for each link (hop) along a path. • Path techniques try to find to the bandwidth along the entire path. • Typically large numbers of probes needed, due to variability in measurements.

Tomography • Two forms of this problem • given edge measurements infer something about the inside state of the network (link speeds, bandwidth, congestion) • given internal state of the network infer something about the traffic entering/exiting the network • What measurements yield the most information? • How much might results be off?

Why study workload characterization? • Capacity planning • Understanding trends in network usage to predict deployment needs • Interactions between applications and protocols • Input for new protocol design • Predicting effects from network changes • Detecting anomalous behavior

Why study routing dynamics? • Is global reachability goal of Internet met? • How fragile is the routing system to failures or attacks? • How much does policy effect performance?

Measurement in the Internet