Reliable multicast from end-to-end solutions to active solutions C. Pham RESO/LIP-Univ. Lyon 1, France DEA DIF Nov. 13th, 2002 ENS-Lyon, France
Q&A • Q1: How many people in the audience have heard about multicast? • Q2: How many people in the audience know basically what multicast is? • Q3: How many people in the audience have ever tried multicast technologies? • Q4: How many people think they need multicast?
My guess on the answers • Q1: How many people in the audience have heard about multicast? • 100% • Q2: How many people in the audience know basically what multicast is? • about 40% • Q3: How many people in the audience have ever tried multicast technologies? • 0% • Q4: How many people think they need multicast? • 0%
Purpose of this tutorial • Provide a comprehensive overview of current multicast technologies • Show the evolution in multicast technologies • Achieve 100%, 100%, 30% and 70% to the previous answers next time!
multicast! How multicast can change the way people use the Internet? Everybody's talking about multicast! Really annoying ! Why would I need multicast for by the way? multicast! multicast! multicast! multicast! multicast! multicast! multicast! multicast! multicast! multicast! multicast! alone multicast! multicast! multicast!
From unicast… Sender • Problem • Sending same data to many receivers via unicast is inefficient • Example • Popular WWW sites become serious bottlenecks data data data data data data Receiver Receiver Receiver
…to multicast on the Internet. Sender • Not n-unicast from the sender perspective • Efficient one to many data distribution • Towards low latence, high bandwidth data IP multicast data data data Receiver Receiver Receiver
New applications for the Internet Think about… • high-speed www • video-conferencing • video-on-demand • interactive TV programs • remote archival systems • tele-medecine, white board • high-performance computing, grids • virtual reality, immersion systems • distributed interactive simulations/gaming…
A very simple example • File replication • 10MBytes file • 1 source, n receivers (replication sites) • 512KBits/s upstream access • n=100 • Tx= 4.55 hours • n=1000 • Tx= 1 day 21 hours 30 mins!
Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS HPSS A real example: LHC (DataGrid) 1 TIPS = 25,000 SpecInt95 PC (1999) = ~15 SpecInt95 Online System ~100 MBytes/sec ~PBytes/sec Offline Farm~20 TIPS Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size ~100 MBytes/sec Tier 0 CERN Computer Center > ~20 TIPS ~622 Mbits/sec or Air Freight Tier 1 ~ 4 TIPS Fermilab France Regional Center UK Regional Center Italy Regional Center ~2.4 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Institute ~0.25TIPS Institute Institute Institute Physics data cache 100 - 1000 Mbits/sec Workstations source DataGrid
Multicast for computational grids application user from Dorian Arnold: Netsolve Happenings
Some grid applications Astrophysics: Black holes, neutron stars, supernovae Mechanics: Fluid dynamic, CAD, simulation. Distributed & interactive simulations: DIS, HLA,Training. Chemistry&biology: Molecular simulations, Genomic simulations.
Reliable multicast: a big win for grids Data replications Code & data transfers, interactive job submissions Data communications for distributed applications (collective & gather operations, sync. barrier) Databases, directories services SDSC IBM SP 1024 procs 5x12x17 =1020 18.104.22.168 NCSA Origin Array 256+128+128 5x12x(4+2+2) =480 CPlant cluster 256 nodes Multicast address group 22.214.171.124
We see something, but too weak. Please simulate to enhance signal! From reliable multicast to Nobel prize! OK! Resource Estimator Says need 5TB, 2TF. Where can I do this? Resource Broker: LANL is best match… but down for the moment From President@earth.org Congratulations, you have done a great job, it's the discovery of the century!! The phenomenon was short but we manage to react quickly. This would have not been possible without efficient multicast facilities to enable quick reaction and fast distribution of data. Nobel Prize is on the way :-) Resource Broker: 7 sites OK, but need to send data fast…
Wide-area interactive simulations computer-based sub-marine simulator display INTERNET battle field simulation human in the loop flight simulator
The challenges of multicast SCALABILITY SCALABILITY SCALABILITY SCALABILITY
Part I The IP multicast model
A look back in history of multicast • History • Long history of usage on shared medium networks • Data distribution • Resource discovery: ARP, Bootp, DHCP • Ethernet • Broadcast (software filtered) • Multicast (hardware filtered) • Multiple LAN multicast protocols • DECnet, AppleTalk, IP
IP Multicast - Introduction • Efficient one to many data distribution • Tree style data distribution • Packets traverse network links only once • replication/multicast engine at the network layer • Location independent addressing • IP address per multicast group • Receiver-oriented service model • Receivers subscribe to any group • Senders do not know who is listening • Routers find receivers • Similar to television model • Contrasts with telephone network, ATM
126.96.36.199 source source site 2 host_1 host_1 Ethernet host_2 multicast router from logical view... ...to physical view receiver multicast group 188.8.131.52 multicast router site 1 Internet receiver multicast router host_3 host_2 host_3 Ethernet receiver 184.108.40.206 receiver 220.127.116.11 multicast distribution tree The Internet group model • multicast/group communications means... • 1 n as well as n m • a group is identified by a class D IP address (18.104.22.168 to 22.214.171.124) • abstract notion that does not identify any host! from V. Roca
126.96.36.199 Multicast address group 188.8.131.52 Example: video-conferencing from UREC, http://www.urec.fr
The Internet group model... (cont’) • local-area multicast • use the potential diffusion capabilities of the physical layer (e.g. Ethernet) • efficient and straightforward • wide-area multicast • requires to go through multicast routers, use IGMP/multicast routing/...(e.g. DVMRP, PIM-DM, PIM-SM, PIM-SSM, MSDP, MBGP, BGMP, etc.) • routing in the same administrative domain is simple and efficient • inter-domain routing is complex, not fully operational from V. Roca
IP Multicast Architecture Service model Hosts Host-to-router protocol(IGMP) Routers Multicast routing protocols(various)
security reliability mgmt congestion control other building blocks Multicast and the TCP/IP layered model Application higher-level services user space kernel space Socket layer UDP TCP multicast routing ICMP IP / IP multicast IGMP device drivers from V. Roca
Internet Group Management Protocol • IGMP: “signaling” protocol to establish, maintain, remove groups on a subnet. • Objective: keep router up-to-date with group membership of entire LAN • Routers need not know who all the members are, only that members exist • Each host keeps track of which mcast groups are subscribed to • Socket API informs IGMP process of all joins
IGMP: subscribe to a group (1) 184.108.40.206 220.127.116.11 18.104.22.168 22.214.171.124 Host 1 Host 2 Host 3 periodically sendsIGMP Query at 126.96.36.199 empty 188.8.131.52 reach all multicast host on the subnet from UREC
IGMP: subscribe to a group (2) somebody has already subscribed for the group 184.108.40.206 220.127.116.11 18.104.22.168 22.214.171.124 Host 1 Host 2 Host 3 Sends Reportfor 126.96.36.199 188.8.131.52 from UREC
IGMP: subscribe to a group (3) 184.108.40.206 220.127.116.11 18.104.22.168 22.214.171.124 Host 1 Host 2 Host 3 Sends Reportfor 126.96.36.199 188.8.131.52 184.108.40.206 from UREC
Data distribution example 220.127.116.11 18.104.22.168 22.214.171.124 126.96.36.199 Host 1 Host 2 Host 3 188.8.131.52 184.108.40.206 OK data 220.127.116.11 from UREC
IGMP: leave a group (1) 18.104.22.168 22.214.171.124 126.96.36.199 Host 1 Host 2 Host 3 Sends Leavefor 188.8.131.52 at 184.108.40.206 220.127.116.11 18.104.22.168 22.214.171.124 reach the multicast enabled router in the subnet from UREC
IGMP: leave a group (2) 126.96.36.199 188.8.131.52 184.108.40.206 Host 1 Host 2 Host 3 Sends IGMP Query for 220.127.116.11 18.104.22.168 22.214.171.124 from UREC
IGMP: leave a group (3) Hey, I'm still here! 126.96.36.199 188.8.131.52 184.108.40.206 Host 1 Host 2 Host 3 Sends Report for 220.127.116.11 18.104.22.168 22.214.171.124 from UREC
IGMP: leave a group (4) 126.96.36.199 188.8.131.52 Host 1 Host 2 Host 3 Sends Leavefor 184.108.40.206 at 220.127.116.11 18.104.22.168 22.214.171.124 from UREC
IGMP: leave a group (5) 126.96.36.199 188.8.131.52 Host 1 Host 2 Host 3 Sends IGMP Query for 244.5.5.5 184.108.40.206 from UREC
Part II Introducing reliability
User perspective of the Internet from UREC, http://www.urec.fr
Links: the basic element in networks • Backbone links • optical fibers • 2.5 to 160 GBits/s with DWDM techniques • End-user access • 9.6Kbits/s (GSM) to 2Mbits/s (UMTS) V.90 56Kbits/s modem on twisted pair • 64Kbits/s to 1930Kbits/s ISDN access • 512Kbits/s to 2Mbits/s with xDSL modem • 1Mbits/s to 10Mbits/s Cable-modem • 155Mbits/s to 2.5Gbits/s SONET/SDH
Routers: key elements of internetworking • Routers • run routing protocols and build routing table, • receive data packets and perform relaying, • may have to consider Quality of Service constraints for scheduling packets, • are highly optimized for packet forwarding functions.
The Wild Wild Web heterogeneity, link failures, congested routers packet loss, packet drop, bit errors… important data ?
Multicast difficulties • At the routing level • management of the group address (IGMP) • dynamic nature of the group membership • construction of the multicast tree (DVMRP, PIM, CBT…) • multicast packet forwarding • At the transport level • reliability, loss recovery strategies • flow control • congestion avoidance
Reliability Models • Reliability => requires redundancy to recover from uncertain loss or other failure modes. • Two types of redundancy: • Spatial redundancy: independent backup copies • Forward error correction (FEC) codes • Problem: requires huge overhead, since the FEC is also part of the packet(s) it cannot recover from erasure of all packets • Temporal redundancy: retransmit if packets lost/error • Lazy: trades off response time for reliability • Design of status reports and retransmission optimization important
Temporal Redundancy Model Packets • Sequence Numbers • CRC or Checksum Timeout • ACKs • NAKs, • SACKs • Bitmaps Status Reports Retransmissions • Packets • FEC information
Part III End-to-end solutions
End-to-end solutions for reliability • Sender-reliable • Sender detects packet losses by gap in ACK sequence • Easy resource management • Receiver-reliable • Receiver detect the packet losses and send NACK towards the source
Challenge: Reliable multicast scalability • many problems arise with 10,000 receivers... • Problem 1: scalable control traffic • ACK each data packet (à la TCP)...oops, 10000ACKs/pkt! • NAK (negative ack) only if failure...oops, if pkt is lost close to src,10000 NAKs! NACK4 NACK4 NACK4 NACK4 NACK4 NACK4 NACK4 source source implosion! NACK4
NACK4 NACK4 data4 data4 data4 NACK4 data4 data4 data4 NACK4 data4 data4 data4 data4 data4 data4 data4 data4 Challenge: Reliable multicast scalability • problem 2: exposure • receivers may receive several time the same packet
One example –SRMScalable Reliable Multicast • Receiver-reliable • NACK-based • Not much per-receiver state at the sender • Every member may multicast NACK or retransmission
SRM (con’t) • NACK/Retransmission suppression • Delay before sending • Based on RTT estimation • Deterministic + Stochastic • Periodic session messages • Sequence number: detection of loss • Estimation of distance matrix among members
SRM Request Suppression Src from Haobo Yu , Christos Papadopoulos