Efficient Broadcasting Strategies for Epidemic Dissemination in Peer-to-Peer Systems

Epidemic Dissemination & Efficient Broadcasting in Peer-to-Peer Systems Laurent Massoulié Thomson, Paris Research Lab Based on joint work with: Bruce Hajek, Sujay Sanghavi, Andy Twigg, Christos Gkantsidis and Pablo Rodriguez

Context • P2P systems for live streaming & Video-on-Demand • PPLive, Sopcast, TVUPlay, Joost, Kontiki… • Internet hosts form overlay network • Data exchanges between overlay neighbours • Aim: real time playback at all receivers • Soon the main channel for multimedia diffusion?

Diffusion of Code Red Virus

Diffusion of Code Red Virus Logistic curve (Verhulst 1838, Lotka 1925,…) Exponential growth Optimal global infection time:logarithmic in population size

Epidemics for live streaming diffusion Data packets 1 2 3 4 1 2 2 • Mechanism specification: selection rule for • target node • packet to transmit Epidemics (one per packet) competing for resources

Problem statement • Currently deployed systems rely on epidemic approach • Appeal of simple & decentralised schemes • Large user populations (103 – 106) • High churn (nodes join and leave) • “Cost of decentralisation? i.e., can epidemics make efficient use of communication resources? Metrics: rate and delay

Outline • Delay-optimal schemes [S. Sanghavi, B. Hajek, LM] • Rate-optimal schemes [LM, C. Gkantsidis, P. Rodriguez and A. Twigg] • Outlook

The access constraint scenario Scarce resource: access capacity • Models DSL / Cable uplink bandwidth limitations • Normalised: 1 packet / second … • Bounds on optimal performance • Throughput = N / (N-1)  1 (pkt / second) • Delay = log2(N) where N: number of nodes

0.02 0.01 0 20 40 Challenge Fraction of nodes reached Naïve approach • Random target • First useful packet 1 2 3 Sender’s packets 1 2 4 5 7 8 1st useful packet 1 2 3 4 Time Receiver’s packets Tension between timeliness of delivery and diversity

The “random target / latest packet” policy Fraction of nodes reached Sender’s packets 1 2 4 5 7 8 Latest packet ? ? ? ? ? ? ? ? Receiver’s packets Time

The “random target / latest packet” policy • Diffusion at rate 63% of optimal and with optimal delay feasible (Do source coding at source over consecutive data windows) Main result: Each node receives each packet w.p. 1-1/e  63% with optimal delay ( less than log2(N) ), Independently for distinct packets.

Proof idea Nodes that have pkt with label  t Fraction of nodes 1 Same dynamics as single epidemic diffusion translated logistic curve Nodes that have pkt with label  t+1 time t t+1 Number of transmission attempts for packet t: N  area between curves = N  Number of nodes receiving t:

Outline • Delay-optimal schemes [S. Sanghavi, B. Hajek, LM] • Rate-optimal schemes [LM, C. Gkantsidis, P. Rodriguez and A. Twigg] • Outlook

Access constraints scenario • Network assumptions: • access capacities, ci • Everyone can send to everyone (complete communication graph) • Statistical assumptions: • source creates fresh packets at instants of Poisson process with rate λ • Packet transmission time from node i: Exponential r.v. with mean 1/ci  Optimal broadcast rate:

The “Most deprived neighbour / random useful packet” policy Sender’s packets 1 2 4 5 7 8 5 1 5 7 8 1 4 Potential receiver 1 Potential receiver 2 Source policy: sends “fresh” packets if any (fresh = not sent yet to anyone)

Main result • Provided λ < λ*, Markov process describing system state is ergodic. • Hence all packets are received at all nodes after time bounded in probability Proof: identifies “workload” as Lyapunov function for fluid dynamics of Markov process Open questions: • Magnitude of delays (simulations suggest logarithmic) • Extension to general, not complete graphs

Extension to limited neighborhoods • Each node maintains shortlist of neighbours • Sends to most deprived from neighbour set • Periodically adds randomly chosen neighour, and dumps least deprived Neighbourhood size stays fixed Ergodicity result still holds: fluid dynamics unchanged Q: impact of neighborhood size?

Network constraints • Graph connecting nodes • Capacities assigned to edges • Achievable broadcast rate [Edmonds, 73]: • Equals maximal number of edge-disjoint spanning trees that can be packed in graph • Coincides with minimal max-flow ( = min-cut) between source and arbitrary receiver

Random useful packet selection and Edmonds’ theorem 1 2 4 5 7 8 Based on local informations No explicit construction of spanning trees 5 1 4 Main result: When injection rate λ strictly feasible, Markov process is ergodic ? ? ? ? ? ? ? ? ?

Proof idea λ s s s,1 s,2 1 2 3 s,1,3 s,1,2 s,2,3 Original network Induced network s,1,2,3 λ ? Variables xA: Number of packets present exactly at nodes in set A • Fluid Renormalisation: • The xA obey deterministic dynamics • Convergence to zero of fluid trajectories: • shown by using Lyapunov function

Comments • Provides “analytical” proof of Edmond’s theorem • Delays?

Conclusions • Epidemic diffusion • Straightforward implementation • Efficient use of bandwidth resources • Random & local decisions lead to global optimum

Outlook • Open problems • Schemes both delay- and rate- optimal? • Concurrent stream diffusions? • Stability proofs without the Lyapunov function?

Efficient Broadcasting Strategies for Epidemic Dissemination in Peer-to-Peer Systems