Infrastructure Primitives for Overlay Networks
260 likes | 293 Vues
Discussing the shared overlay functionalities, overlay requirements, diverse overlay challenges, and implementation alternatives for network overlays. Covering path selection, packet replication, measurement metrics, and experiments.
Infrastructure Primitives for Overlay Networks
E N D
Presentation Transcript
Infrastructure Primitives forOverlay Networks Karthik Lakshminarayanan (with Ion Stoica and Scott Shenker) SAHARA/i3 Retreat – Summer, 2003
Goal: Share Overlay Functionality • What do overlays share? • Underlying IP infrastructure (of course!) • Underlying hardware (maybe, e.g. PlanetLab) Why not share… • Higher level overlay functionality • Each application designs overlay routing from scratch • Lower deployment barrier: design effort & deployment expense • Network weather information • Each application performs probes to find good overlay paths • Reduce overlay maintenance overhead
Diverse Overlay Requirements What are the requirements for supporting most of the overlays applications? • Routing control • Adaptive routing based on application sensitive metrics • Measurements of the virtual link characteristics • Data manipulation • Manipulate/store (e.g. transcode) data in the path to the destination
Our Approach • Embed in the infrastructure: • Low-level routing mechanisms, e.g. forwarding, replication • Third-party services: • Services are implemented at end-hosts, shared using an open interface • Information for making routing decisions, e.g. measurements of path delay, loss-rate, bandwidth • At the end-hosts: • Not shared at all, e.g. policies for choosing paths
Outline • Motivation and Challenges • Infrastructure Primitives • Network Measurements • System Architecture – Weather Service • Experiments • Some Applications
m m m R’ R Path Selection n1 n2 • Similar to “loose source routing” • End-hosts specify points through which packet is routed • Routing between the specified points handled by IP
m m m1 m R’ R Path Replication n1 n2 • End-host specify that a particular packet be replicated at a node and then sent along a path
Infrastructure Primitives • Path Selection • Packet Replication Claim: This is enough to do (i) Adaptive routing (ii) Measurements (iii) Data manipulation • Why this approach? • Control path must be outside – collective knowledge to decide what to monitor • No difference between data and measurement traffic – better security, nodes have no incentive to lie
Implementation alternatives • At the IP layer: • Path selection • Implemented in the form of loose source routing • Requires path in the packet header • Path replication requires a new primitive • Why we chose i3: • Implements the two primitives without any changes • Path selection: Set up routing state beforehand (instead of in the header) • Robustness to node failures • We know it well! This is one possible realization, and not the only one
Outline • Motivation and Challenges • Infrastructure Primitives • Network Measurements • System Architecture – Weather Service • Experiments • Some Applications
Metrics of measurement • Round-trip delay • Loss-rate • Available bandwidth • Bottleneck bandwidth … in the process, demonstrate the versatility of the primitives
m1 m1 m1 m m R Round-trip Delay n1 n2 • Use path selection primitive to send packet m along R→n1→R • Use path selection in conjunction with packet replication to send packet along R→n1→n2→n1→R • Difference yields the RTT of the link (n1↔n2) To measure: RTT(n1→n2)
m1 m1 m1 m R Round-trip Delay n1 n2 • Use path selection primitive to send packet m along R→n1→R • Use path selection in conjunction with packet replication to send packet along R→n1→n2→n1→R • Difference yields the RTT of the link (n1↔n2) To measure: RTT(n1→n2)
m1 m1 m1 m2 m One-way Loss Rate n2 n1 • m2 used to differentiate loss on (n1→n2) from that on (n2→n1) • (m Λ ~m1 Λ ~m2) loss on virtual link (n1→n2) • False positives • False negatives • Probability of false positives/negatives ≈ O(p2 ) R To measure l(n1→n2)
Available Bandwidth • Come to the poster session.
Outline • Motivation and Challenges • Infrastructure Primitives • Network Measurements • System Architecture – Weather Service • Experiments • Some Applications
Client A Network measurements Query/reply routing info. Weather Service 1 Setup routes Weather Service 2 Client D Client B Client C What we envision Challenge: To make the measurements scale to an infrastructure of 1000s of nodes
Outline • Motivation and Challenges • Infrastructure Primitives • Network Measurements • System Architecture – Weather Service • Experiments • Some Applications
Experiments: Delay Estimation • More than 92% of the samples have error < 10% • If we consider median over 15 consecutive samples, 98.3% of the samples have error < 10%
Experiments: Loss-Rate Estimation • Accuracy of 90% in over 89% of the cases (after filtering the few nodes with high losses)
Experiments: Avail-BW Estimation • Within a factor of two for 70% of the pairs • Avail-BW is not static, so this is reasonable
How applications can use this • Adaptive routing: • End-hosts query the WS and construct the overlay • Quality of paths depends on how sophisticated the WS is • No changes to infrastructure if metrics change • Multicast: • Union of different unicast paths that the WS returns • Number of replicas is no larger than the degree of the overlay graph • Finding closest replica: • Client queries the WS to get the best among a set of nodes • WS may export an API that allows this*
Multicast experiment • Nodes at 37 sites in PlanetLab (1-3 per site). • Delay-optimized multicast tree rooted at Stanford • Union of delay-optimized unicast paths • 90% of the nodes had RDP < 1.38; 99.7% of the nodes had RDP < 2
Summary of design • Minimalist infrastructure functionality • Delegate routing to applications • Applications know their requirements best • Delegate performance measurements to third-party applications • Allows this to evolve to meet changing requirement
Open questions & Future work • Why minimalist design? • Why not more primitives? E.g. For supporting QoS • What if path characteristics are correlated? • Shared bottleneck • Losses at the egress/ingress link • Sub-problems • By having incomplete information about network weather, how much do we lose (if at all)? • How much does accuracy of measurements affect the final outcome? • If the underlying routing is bad, what is the diversity of such an overlay needed to do a good job? • Design API and develop applications based on it