390 likes | 700 Vues
Multimedia conferencing. Raphael Coeffic (rco@iptel.org) Based partly on slides of Ofer Hadar, Jon Crowcroft. Which Applications?. Conferencing: Audio/video communication and application sharing First multicast session IETF 1992 M any -to-m any scenarios Media Broadcast
E N D
Multimedia conferencing Raphael Coeffic (rco@iptel.org) Based partly on slides of Ofer Hadar, Jon Crowcroft
Which Applications? • Conferencing: • Audio/video communication and application sharing • First multicast session IETF 1992 • Many-to-many scenarios • Media Broadcast • Internet TV and radio • One to many scenario • Gaming • Many to many
What is needed? • Efficient transport: • enable real time transmission. • avoid sending the same content more than once. • Best transport depends on available bandwidth and technology. • Audio processing: • How to ensure Audio/Video Quality? • How to Mix the streams? • Conference setup: • who is allowed to start a conference? • how fast can a conference be initiated? • Security and privacy: • How to prevent not-wanted people from joining? • How to secure the exchanged content? • Floor control: • How to maintain some talking order?
How to Realize? Centralized • All register at a central point • All send to central point • Central point forwards to others • Simple to implement • Single point of failure • High bandwidth consumption at center point • Must receive N flows • High processing overhead at center point • Must decode N flows mix the flows and encode N flows • With no mixing the central point would send Nx(N-1) flows • Appropriate for small to medium sized conferences • Simple to manage and administer: • Allows access control and secure communication • Allows usage monitoring • Support floor control • Most widely used scenario • No need to change end systems • Tightly coupled: Some instances know all information about all participants at all times
How to Realize? Full Mesh • All establish a connection to each other • All can send directly to the others • Each host will need to maintain N connections • Outgoing bandwidth: • Send N copies of each packet • simple voice session with 64kb/s would translate to 64xNkb/s • Incoming bandwidth: • If silence suppression is used then only active speakers send data • In case of video lots of bandwidth might be consumed • Unless only active speakers send video • Floor control only possible with cooperating users • Security: simple! do not send data to members you do not trust • End systems need to mix the traffic –more complex end systems
How to Realize? End point based • All establish a connection to the chosen mixer. • Outgoing bandwidth at the mixer end point: • Send N copies of each packet • simple voice session with 64kb/s would translate to 64xNkb/s • Incoming bandwidth: • If silence suppression is used then only active speakers send data • In case of video lots of bandwidth might be consumed • Unless only active speakers send video • One of the end systems need to mix the traffic –more complex end system. • Mostly used solution for three-way conferencing.
How to Realize? Peer-to-Peer • Mixing is done at the end systems • Increases processing over-head at the end systems • Increases overall delay • Possibly mixed a multiple times • If central points leave a conference the conference is dissolved • Security: Must trust all members • Any member could send all data to non-trusted users • Access control: Must trust all members • Any member can invite new members • Floor control: requires cooperating users
Transport considerations • Transport layer: • Most of the group communication systems on top of unicast sessions. • Very popular in the past: multicast. • Application layer: • RTP over UDP. • Why not TCP? • Better NAT traversal capabilites (used by Skype as the last solution). • But, not really suitable for real time feed back (Why?). • Control protocol: • Interactive conferencing: SIP, H.323, Skype, etc... • Webcast: RTSP, Real audio and other flavours. • Session description: • SDP (Session description protocol).
IP Multicast • Why? • Most group communication applications are based on top of unicast sessions. • By unicast, each single packet has a unique receipient. • How? • Enhance the network with support for group communication • Optimal distribution is delegated to the network routers instead of end systems • Receivers inform the network of their wish to receive the data of a communication session • Senders send a single copy which is distributed to all receivers
Multicast vs. Unicast B A C E D • File transfer from C to A,B,D and E • Unicast: multiple copies • Multicast: single copy
IP Multicast • True N-way communication • Any participant can send at any time and everyone receives the message • Unreliable delivery • Based on UDP: Why? • Avoids hard problem (e.g., ACK explosion) • Efficient delivery • Packets only traverse network links once (i.e., tree delivery) • Location independent addressing • One IP address per multicast group • Receiver-oriented service model • Receivers can join/leave at any time • Senders do not know who is listening
IP Multicast addresses • Reserved IP addresses • special IP addresses (class D): 224.0.0.0 through 239.255.255.255 • class D: 1110+28 bitsà 268 million groups (plus scope for add. reuse) • 224.0.0.x: local network only • 224.0.0.1: all hosts • Static addresses for popular services (e.g., SAP –Session Announcement protocol)
Alternatives to Multicast • Use application level multicast • Multicast routing done using end hosts • Hosts build a multicast routing tables and act as multicast router (but on application level) • User request content using unicast • Content distributed over unicast to the final users
Application level Multicast vs. unicast Content source Application level multicast Content source Traditional
Conference mixer architecture • Main components for centralized conference mixer: • Coder / decoder (+ quality ensuring components). • Synchronization • Mixer • Processing pipeline:
Audio Mixing G.711 G.711 G.711 A A E E D X-A=B+C G.729 G.729 X=A+B+C G.729 X-B=A+C B B E E D GSM GSM GSM C C X-C=B+A E E D E: Encoder D: Decoder Periodic timer
Audio Quality • Mostly based on „Best effort“ networks: • No garanty for nothing. • Packet get lost and/or delayed depending on the congestion status of the network. • Depending on the codec, different quality can be reached: • Mostly reducible to a „needed bandwidth vs. quality“ tradeoff. • Wanted properties: loss resistancy, low complexity (easy to implement in embedded hardware). • Audio datas have to be played at the same rate they have been sampled: • Different buffering techniques have to be considered, depending on the application. • Pure streaming (Radio/TV) are not interactive and thus not influenced by the delay. Quality is everything. • Interactive conferencing need short delays to garanty the real time property. Delay is experienced as „very annoying“ by users in such applications.
Codecs quality measurements • Codecs: Mean Opinion Score (MOS) measurements:
Audio quality: packet loss • Packet loss: • The impact on voice quality depends on many factors: • Average rate: rate under 2~5% (depending on the codec) are almost unhearable. Over 15% (highly depending on the burstiness), most calls are experienced as ununderstandable. • Burstiness: depending on the loss distribution, the impairement can vary from small artifacts due to packet loss concealment to really anoying quality loss. • Modern codecs like iLBC, which are exclusively focused on VoIP, are much more resistant and should thus be prefered to PSTN based low-bitrate codecs. • Considering media servers and specially conferencing bridge, we should concentrate on receiver based methods, as every other method would not be compatible with the customers‘ phones. • Solutions: support appropriate codecs, assert a minimal link quality and implement a reasonable PLC algorithm.
Audio quality: jitter • Delay variation (Jitter) • Why? • varying buffering time at the routers on the packets‘ way. • Inherent to the transmission medium (WiFi). • Depending on the buffering algorithm, quality impairements are mostly caused by a too high ear-to-mouth delay or late loss. • Ear-to-mouth delay: • Whereby delays under 100 ms are not noticeable, value over 400 ms make a natural conversation very difficult. • Late loss: • If the buffering delay is smaller than the actual delay, some packets arrive after their playout schedule. This effect in called ‚Late loss‘. • Delivering a good voice quality means, apart from packet loss concealment, minimizing delay and late loss.
Adaptive playout • Static buffer • Playout is delayed by a fix value. • Buffer size has to be computed once for the rest of call. • Some clients implement a panic mode, increasing the buffer size dramaticaly (x 2) if the late loss rate is too high. • Advantages: • Very low complexity. • Drawbacks: • High delay. • Performs poorly if the jitter is too high. • Does not solve the clock skew problem.
Adaptive playout (2) • Dynamic buffer: talk spurt based. • Within a phone, a speaker is rarely active all the time. So it is possible to distinguish between voiced and unvoiced segments. • Ajusting the buffering delay within unvoiced segments has no negative impact on the voice quality. • Using a delay prediction algorithm on the previous packets, we then try to calculate the appropriate buffering delay for the next voiced segment. • Advantages: • Low complexity. • Solves the clock skew problem. • Drawbacks: • Needs Voice Activity Detection (VAD), either at the sender or at the receiver. • High delay. • Performs poorly if the jitter is varying fast (within a voice segment).
Adaptive playout (3) • Dynamic buffer: packet based. • Based on Waveform Similarity Overlap Add Time-scale modification (WSOLA) • Enables packet scaling without pitch distortion. • Very good voice quality: scaling factors from 0.5 to 2.0 are mostly unhearable if done locally. • But: High processing complexity.