Scalable Service Oriented Architecture for Audio/Video Conferencing

Scalable Service Oriented Architecture for Audio/Video Conferencing By Ahmet Uyar Wednesday, March 23, 2005

Outline • Research Issues • Criteria for videoconferencing systems • Overview of current videoconferencing systems • Overview of GlobalMMCS architecture • NaradaBrokering overview and additions • Performance tests for audio/video delivery • Service oriented architecture for videoconferencing • Conclusion

Research Issues • We investigate the question of how to develop scalable and universally accessible videoconferencing systems over Internet. • We propose using publish/subscribe event broker systems for the distribution of real-time audio and video streams in videoconferencing sessions and investigate the issues pertaining to scalability, performance, data representation, meeting management and media processing services. • Since real-time audio/video delivery requires low latency and high bandwidth, we investigate the performance and the scalability of this software based messaging middleware extensively. • We propose service oriented architecture for videoconferencing. We identify the tasks performed in videoconferencing sessions and provide independently scalable components for each task. We identified three main tasks in videoconferencing sessions: • audio/video distribution • media processing • meeting management.

Criteria for Videoconferencing Systems • We identified the following criteria for videoconferencing systems: • Scalability • Security • Traversing through firewalls, proxies and NAT • Supporting heterogeneous clients • Easy to develop, maintain and use • Support for data conferencing

Videoconferencing Standards and Systems • Multicast based systems • AccessGrid • H.323 based systems • Polycom • CUseeMe • VRVS

Multicast Based Systems • AccessGrid is the most commonly used room based videoconferencing system for group communications. • Scales well. • Difficult to provide security services. No authority to manage multicast IP numbers. Vulnerable to denial-of-service attacks. • No support for going through firewalls and proxies. • Low end users can not join meetings. No media processing is provided. • Easy to use and understand. • Third party data conferencing applications can be used.

H.323 Based Systems • There are many companies that provide H.323 based videoconferencing systems. Polycom, FVC, etc. • Does not scale well. • H.235 defines the security mechanisms but most H.323 based systems do not implement it yet. • H.323 based systems are not firewall friendly. It requires almost all ports to be open. • Limited number of heterogeneous clients can be supported. • Not very easy to understand and develop services. • T.120 define data conferencing: whiteboard sharing, file transfer and application sharing.

H.323 Centralized Multipoint Conferencing H.323 Decentralized Multipoint Conferencing H.323 MCU cascading architecture

VRVS (Virtual Rooms Videoconferencing System) • Uses software reflectors to distribute audio/video streams. • Not open source. No details available. • Can go through firewalls, NATs and proxies.

GlobalMMCS Overview • Videoconferencing Tasks: • Audio/Video Distribution • Media processing • Meeting management

Evaluation of GlobalMMCS • Scalability: Provides scalability by separating media processing from media delivery. • Security: NB provides all security services. It also takes precautions against denial of service and replay attacks. • Traversing through firewalls, proxies and NAT • Supporting heterogeneous clients: Since we provide a scalable media processing framework and many transport protocols, we can support a diverse set of end points. • Easy to develop, maintain and use • Support for data conferencing

Media Distribution Middleware(NaradaBrokering) • Requirements for Media Delivery • High Bandwidth • Low latency • Tolerate Package Loss • NaradaBrokering • NB organizes brokers in a hierarchical cluster-based architecture. • NB supports dynamic broker and link additions/removals. • Messages are routed only to those brokers that have at least one subscription • NB has a flexible transport mechanism • NB is JMS compliant and supports reliable message delivery. • NB provides performance monitoring service

NaradaBrokering broker organization

Incorporating Support for Audio/Video Delivery in NaradaBrokering • Adding support for an unreliable transport protocol, UDP • Implementing a distributed topic number generation mechanism • Designing a Unique ID Generation Mechanism • Designing a new compact event • Adding support for legacy RTP clients • Some improvements in the routing algorithm

Implementing Distributed Topic Number Generation Mechanism • The requirements for topic number generation: • spatial independence of a topic generator • temporal independence of a topic generator • Acceptable size • One topic number generator runs in every broker • 20 bytes topic generator id guarantees spatial independence • 220 = 1,048,576 topic number generators • 44 bytes timestamp provides temporal independence • 244=17592186044416 distinct timestamp values • 557 years with one millisecond resolution. • UUID solves a similar problem with 16 bytes • This mechanism can also be used to generate unique ids.

Designing a New Event • In publish/subscribe messaging systems, messages tend to have many headers. • A message in JMS API has at least 10 headers. These headers take around 200 bytes when they are serialized to transfer over the network. • A ULAW audio package for 20 ms has a size of 172 bytes and entails 64 kbps network bandwidth. Padding an extra 200 bytes of header to each audio package results in the bandwidth requirement of 148 kbps. • It is also more costly to serialize/de-serialize more headers. • RTPEvent has four headers and 14 bytes long. • Event and Media headers are 1 byte each • Topic Name is 8 bytes • Source info is 4 bytes.

Supporting Legacy RTP Clients • RTPLinks receive raw RTP packages over UDP or Multicast from legacy systems, wrap them in RTPEvents and propagate them through the broker node. It also receives RTPEvents from the broker node and sent them as raw RTP packages to clients. • Each RTPLink starts two sockets: one for RTP and the other for RTCP. Similarly, it subscribes to two topics: one for RTP and the other for RTCP. • Some RTP sessions might have more than one media stream, in that case, each stream might be published to a different topic. • RTPLinks can either be managed by statically of dynamically.

Performance Tests of NaradaBrokering • The Characteristics of Audio and Video Streams • Quality Assessment of Media Delivery • Performance Tests for One Broker • Performance Tests for Distributed Brokers • Wide-Area Media Delivery Tests

Characteristics of Audio and Video Streams • Audio streams are composed of fixed size packages with regular intervals. • We chose 64 kbps ULAW audio stream to be used in the tests: • One audio package is sent every 30ms. Each audio package is 252 bytes. • There are 4100 packages in total, during 2 min. • Video codecs also encode frames periodically. However, each frame may have multiple video packages. Full picture update frames have much more packages. • We chose H.263 video format, avrg. bandwidth 280kbps, for 2 min: • 15 frames are encoded every second. One frame every 66ms. • 1800 frames and 5610 packages in total. On avrg. 3.1 packages per frame. • One full picture update every 60 frames or 4 seconds.

Quality Assessment of Media Delivery • There are three important factors: latency, jitter and package loss • ITU recommends that the mouth-to-ear latency of audio should be • Less than 400ms for acceptable quality • Less than 300ms for good quality • Less than 150ms for excellent quality. • The total latency is the combination of: • Processing at sender and receiver • Transmission latency • Routing latency by the broker network • We limit the routing latency to 100ms at most. • The packages that take more than 100ms are labeled as late arrivals. • We limit the jitter caused by routing to 10ms • We limit the loss rate to 1.0%

Performance Tests for One Broker • Single Meeting Tests • Single audio meeting tests • Single video meeting tests • Audio + Video meeting tests • Multiple Meeting Tests • Multiple audio meeting tests • Multiple video meeting tests • Multiple Audio + Video meeting tests

Single Meeting Tests • One transmitter and 12 measuring receivers. Other receivers are passive. • Tests are conducted in a Linux cluster with 8 identical machines. These machines had Double Intel Xeon 2.4GHz CPUs, 2GB of memory with Linux 2.4.22 kernel. All programs are written in Java. There is gigabit connection among the cluster nodes.

Single Audio Meeting Tests I

Single Audio Meeting Tests II • The latency of first user is constant and does not depend on the number of users in a meeting • Each audio package is independent of others. The routing of each package is completed before the next one arrives. All audio packages in the audio stream takes almost the same amount of time to arrive to a client. • The broker saturates when the latency of the last user is more than 30ms. • 1500 users can be supported in an audio meeting

Broker saturation in single audio meeting • Latency values for the middle user in single audio meeting with 1600 participants.

Single Video Meeting Tests I

Single Video Meeting Tests II • Latency values for the last receiver in single video meeting with 400 participants. • Peaks correspond to full picture update frames. • One broker can support at most 400 participants because of late arriving packages. Although the broker is saturated when there are 1000 participants. • The main reason for the late arriving packages are the full picture updates.

Audio and Video Combined Meeting Tests • Each one affects the other. • Our initial tests showed that the impact of video meeting on the performance of an audio meeting is significant. Therefore, we gave priority to audio routing at the broker. • There are two queues at the broker: audio and non-audio. If an audio package arrives, it is routed first as long as the routing of the currently routed package is over. • When there are 600 participants, there is only 5ms difference. Therefore, the impact of the video meeting is not significant on the performance of the audio meeting

Comparison of single video meetings and audio + video meetings • This test shows that the impact of an audio meeting on the performance of a video meeting is not significant. • In audio and video combined meetings, the broker supports almost the same number of participants as in the case of single video meetings. The main reason for this is the better utilization of broker resources when there are two concurrent meetings.

Multiple Video Meeting Tests

Latency values for each video package when there are 30 meetings with 600 participants. • This graph shows that there are no peaks in latency values for full picture update frames as it was the case in the single video meeting case.

Summary of Single Broker Tests • 1500 participants are supported in one audio meeting • 400 participants are supported in one video meeting • Up to 400 audio participants and 400 video participants are supported in audio + video meetings. • 700 participants can be supported in 35 video meetings each having 20 participants • 1300 participants can be supported in 65 audio meetings each having 20 participants • 20 audio and 20 video meetings can be supported each having 20 participants.

Performance Tests for Distributed Brokers • We have given priority to inter-broker package delivery over local client deliveries. • This lets packages to travel many brokers with very little overhead. It lets the broker network to scale. • It also eliminates cases where one overloaded broker severely affects the performance of other brokers.

Test results with single and double queuing

Single Video Meeting Tests for Distributed Brokers • There are equal number of participants in each broker. • We gather results from first and last user from each broker.

Latencies from 4 brokers • Broker1 and Broker2 have very similar latency values. • Broker3 and Broker4 have similar and slightly better latency values. • Going through multiple brokers does not introduce considerable overhead. • Scalability of the system can be increased almost linearly by adding new brokers.

Multiple Meeting Tests for Distributed Brokers • The same setting as the single video meeting tests. However, all broker were running at cluster 2. • The behavior of the broker network is more complex when there are multiple concurrent meetings compared to having a single meeting. • Having multiple meetings provide both opportunities and challenges. Conducting multiple concurrent meetings on the broker network can increase both the quality of the service provided and the number of supported users as long as the size of these meetings and the distribution of clients among brokers are managed appropriately. • The best broker utilization is achieved when there are multiple streams coming to a broker and each incoming stream is delivered to many receivers. If all brokers are utilized fully in this fashion, multi broker network provides better services to higher number of participants.

Multiple Video Meeting Tests • 4 brokers can support 48 meetings with 1920 users in total with excellent quality. • This number is higher than the single video meeting tests in which four brokers supported up to 1600 users. • When we repeated the same test with meeting size 20, 1400 participants can be supported. Latency values and loss rates for meeting size 40

Wide-Area Media Delivery Tests • We tested two cases: • single broker at Indiana • one broker at each site • We tested with five distant sites: • Syracuse, NY, • Tallahassee, Florida, • Cardiff, UK • Two sites at Bloomington, IN

Summary of Wide-Area Tests • Running brokers at distributed locations has many benefits: • Saves bandwidth, and eliminates bandwidth limitations. • Transferring smaller number of streams yields better transmission services with smaller latency, jitter and loss rates. • Load is distributed to many brokers, more users can be served with better quality services. • sender-to-receiver transmission latency can be reduced considerably by running brokers at geographically distant locations. • The networks that we used provided excellent services with very small loss rates, latency and jitter values. • The network connections need to be checked for high quality. Cardiff site was not even able to support 10 video streams (3Mbps), way below its full capacity (10Mbps).

Meeting Management Architecture and Services • There are three main components. • Meeting Management Unit starts/ends meetings, handles user joins and leaves. • Media Processing Unit provides; audio mixing, video mixing and image grabbing. • A unified framework is provided to distribute service providers and to manage the interactions among system components.

Messaging Among System Components • Although some messages are sent to a group of destinations, many messages are destined to one target. Therefore, an efficient message exchange mechanism should be designed. • We use reliable JMS messages to provide communications among various components in the system. • This simplifies building a scalable solution, since messages can be delivered to multiple destinations without explicit knowledge of the publisher. • Messaging Semantics • Request/Response messaging • Group messaging • Event based messaging

Topic Naming Conventions • Two types of topics are needed; group topics and unique component topics • All topic names start with a common root, GlobalMMCS. • Group topic names are constructed by adding the component name to the root • GlobalMMCS/AudioSession • GlobalMMCS/AudioMixerServer • Unique component topic names are constructed by adding the unique ids: • GlobalMMCS/AudioSession/<sessionID> • GlobalMMCS/AudioMixerServer/<serverID> • Sometimes a component communicates with many different components; in that case, there is one more layer to distinguish these communication channels • GlobalMMCS/AudioSession/<sessionID>/AudioMixerServer • GlobalMMCS/AudioSession/<sessionID>/RtpLinkManager

Service Distribution Framework • A unified framework to distribute many types of service providers • Addressing: Each service provider and consumer is identified by a unique topic name. • Service Discovery: Dynamic discovery mechanism. Inquiry & ServiceDescription messages. • Service Selection: the consumer selects the best service provider. • Service Execution: the consumer executes the service by sending an Request message. • Advantages: • Fault tolerant • Scalable • Location independent

Session Management • Audio and video sessions are managed separately. • AudioSession objects manage audio sessions and VideoSession objects manage video sessions • MeetingManager objects act as factories for session objects. They initialize and end them. • AudioSession and VideoSession objects provide session management services to participants, such as user joins and leaves. While handling these requests, they usually talk to other system components, such as media processing units and RTP link managers.

JMS message paths for an AudioSession

Audio Mixing & Performance Tests • 6 speakers in each mixer. Two of them were continually talking. • One more audio stream constructed with the mixed stream of all speakers. • All streams were 64kbps ULAW. • The machine: WinXP, 512 MB memory, 2.5 GHz Intel Pentium 4 CPU. • This machine can support around 20 audio mixing sessions

JMS message paths for a VideoSession

Video Mixing & Performance Tests • Four video streams are mixed into one video stream. • Incoming video streams were 150 kbps H.261 stream. • Mixed video stream was H.263 with 18 fps. • Linux machine with 1 GB memory and 1.8GHz Dual Intel Xeon CPU. • Only 3 video mixers are served.

Mixed video streams in various media players

Scalable Service Oriented Architecture for Audio/Video Conferencing