QoS Measurement and Management for VoIP

QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003

Introduction to VoIP & IP Telephony • Transport of voice packets over IP networks • Cost savings • Consolidates voice and data networks • Avoids leased lines, long-distance toll calls • Smart and new services • Call management (filtering, TOD forwarding): CPL • Better than PSTN quality: wide-band codecs • Protocols and Standards • Signaling: SIP (IETF), H.323 (ITU-T) • Transport: RTP/RTCP (IETF)

Practical Issues in VoIP • Quality of Service (QoS) • Internet is a best-effort network • Loss, delay and jitter • Users expect at least PSTN quality for VoIP! • Ease of deployment • Requires seamless integration with legacy networks (PSTN/PBX) • Security is a must • High yardstick of service availability • Can your network achieve 99.999% up time?

Outline • QoS measurement • Objective vs. subjective metrics • Automated measurement of subjective quality • QoS management: improving your quality • End-to-End: FEC, LBR, PLC • Network provisioning: voice traffic aggregation • Reality check • Performance of end-points (IP phones, …) • Deployment issues in VoIP • Evaluation of VoIP service availability through Internet measurement

Workings of a VoIP Client • Audio is packetized, encoded and transmitted • Forward error correction (FEC) may be used to recover lost packets • Playout control smoothes out jitter to minimize late losses; coupled with FEC • Packet loss concealment (PLC) • Last line of “defense” after FEC and playout

LBR: An Alternative to FEC • An (n,k) block FEC code can recover  n-k losses • Low Bit-rate Redundancy (LBR) • Transmit a lower bit-rate version of original audio • No notion of “blocks” • Not bit-exact recovery

Objective QoS Metrics: Loss • Internet packet loss is often bursty • May worsen voice quality than random (Bernoulli) loss • Characterization of packet loss • 2-state Markov (Gilbert) model: conditional loss prob. • More detailed models, but more states! • Extended Gilbert model, nth order Markov model • Hidden Markov model, Gilbert-Elliot model, inter-loss distance • More states  Larger test set, loss of big picture, and • Adaptive applications can trade-off model accuracy for fast feedback • Gilbert model provides an acceptable compromise

Effect of Gilbert Loss Model • Loss burst distribution of a packet trace • Roughly, though not exactly exponential • Loss burstiness on FEC performance • FEC less efficient under bursty loss 1000 Packet trace Gilbert model 100 number of occurrences 10 1 0.1 0 2 4 6 8 10 12 Loss burst length

Objective QoS Metrics: Delay • Complementary Conditional CDF (C3DF) • More descriptive than auto-correlation function (ACF) • Delay correlation rises rapidly beyond a threshold • Approximates conditional late loss probability

Subjective QoS Metrics • Perceived quality • Mean Opinion Score (MOS) • ITU-T P.800/830 • Obtained via listening tests • MOS variations • DMOS (Degradation) • CMOS (Comparison) • MOSc (Conversational): considers delay • A/B preference • Pros: more meaningful to end users • Cons: time consuming, labor intensive

Effect of Loss Model on Perceived Quality • Codec: G.729 (8kb/s ITU std) • Random (Bernoulli) vs. bursty (Gilbert) loss • Bursty  lower MOS • True even when FEC or LBR is used

Going Further: Bridging Objective and Subjective Metrics • The E-model (ITU-T G.107/108) • Originally for telephone network planning • Considers various impairments • Reduces to delay and loss impairment when adapted for VoIP • Objective quality estimation algorithms • Suitable when network stats is not available, e.g., phone-to-phone service with IP in between. • Speech recognition performance may be used as a quality predictor, by comparing with original text

The E-model • Map from loss and delay to impairment scores (Ie, Id) • Compute a gross score (R value) and map to MOSc • Limited number of codec loss impairment mappings

Using Speech Recognition to Predict MOS • Evaluation of automatic speech recognition (ASR) based MOS prediction • IBM ViaVoice Linux version • Codec used: G.729 • Performance metric • absolute word recognition ratio • relative word recognition ratio

Recognition Ratio vs. MOS • Both MOS and Rabs decrease w.r.t. loss • Then, eliminate middle variable p

Speaker Dependency • Absolute performance is speaker-dependent • But relative word recognition ratio is not • Suitable for MOS prediction

Summary of QoS Measurement • Loss burstiness: • Affects (generally worsens) perceived quality as well as FEC performance • May be described with, e.g., a Gilbert model • Delay correlation: • Increases rapidly beyond a threshold, revealed through Complementary Conditional CDF (C3DF) • Late losses are also bursty • Perceived quality (MOS) estimation • Analytical: the E-model • If network statistics N/A: relative word recognition ratio can provide speaker-independent MOS prediction

Outline • QoS measurement • Objective vs. subjective metrics • Automated measurement of subjective quality • QoS management: improving your quality • End-to-End: FEC, LBR, PLC • Network provisioning: voice traffic aggregation • Reality check • Performance of VoIP end-points (IP phones, …) • Deployment issues in VoIP • Evaluation of VoIP service availability through Internet measurement

Quality of FEC vs. LBR • FEC is substantially and consistently better • At comparable bandwidth overhead • Across all codec configurations tested AMR LBR G.729+G.723.1 LBR

Quality of FEC under Bursty Loss • Packet interval T has a stronger effect on MOS with FEC than without FEC

FEC MOS Optimization Considering Delay Effect • Larger T FEC efficiency, but delay  • Optimizing Twith the E-model • Calculate final loss probability after FEC, apply delay impairment of FEC, map to MOSc • Prediction close to FEC MOS test results • Suitable for analytical perceived quality prediction

Trade-off Analysis between Codec Robustness and FEC • 3 loss repair options • FEC, LBR, PLC • Loss-resilient codec • Better PLC • iLBC (IETF) • But more bit-rates • Better than FEC?

Observations and Results • When considering delay: • iLBC is usually preferred in low loss conditions • G.729 or G.723.1 + FEC better for high loss • Example: max bandwidth 14 kb/s • Consider delay impairment (use MOSc)

Effect of Max Bandwidth on Achievable Quality • 14 to 21 kb/s: significant improvement in MOSc • From 21 to 28 kb/s: marginal change due to increasing delay impairment by FEC

Provisioning a VoIP Network • Silence detection/suppression • Transmit only during On period, saves bandwidth • Allows traffic aggregation through statistical multiplexing • Characteristics of On/Off patterns in VoIP • Traditionally found to be exponentially distributed • Modern silence detectors (G.729B VAD, NeVoT SD) produce different patterns

Traffic Aggregation Simulation • Token bucket filter with N sources, R: reserved to peak BW ratio • CDF model resembles trace model in most cases • Exponential (traditional) model • Under-predicts out-of-profile packet probability; • Under-prediction ratio as token buffer size B • Similar results for NeVoT SD

Summary of QoS Management • End-to-End • FEC is superior in quality to LBR • Codec robustness is better than FEC in low loss conditions • Combining both schemes brings the best of both sides • Network provisioning • Observation: New silence detectors (G.729B, NeVoT SD)  non-exponential voice On/Off patterns • Result: performance of voice traffic aggregation  under new On/Off patterns • Important in traffic engineering and Service Level Agreement (SLA) validation

Outline • QoS measurement • Objective vs. subjective metrics • Automated measurement of subjective quality • QoS management: improving your quality • End-to-End: FEC, LBR, PLC • Network provisioning: voice traffic aggregation • Reality check • Performance of end-points (IP phones, …) • Deployment issues in VoIP • Assessment of VoIP service availability through Internet measurement

Mouth-to-ear Delay of VoIP End-points • All receivers can adjust M2E delay adaptively whenever it is too low or too high • M2E delay depends mainly on receiver (esp. RAT) • HW phones have relatively low delay (~45-90ms)

But Adaptiveness  Perfection • Symptom of playout buffer underflow • Waveforms are dropped • Occurred at point of delay adjustment • Bugs in software? • LAN  perfect quality?

Major Observations • Overall: end-points matter a lot! • HW IP phones: 45-90ms average M2E delay • SW clients: • Messenger 2000 lowest (68ms), XP (96-120ms) • c.f. GSMPSTN: 110ms either direction • NetMeeting very bad (> 400ms) • PLC robustness • Acceptable in all 3 IP phones tested, Cisco phone more robust • Silence detection/suppression • Works for speech input • Often fails for non-speech (e.g., music) input • Generates many unnatural gaps • Not good for customer support center (on-hold music)! • Acoustic echo cancellation (AEC): • Good on most IP phones (Echo Return Loss > 40 dB) • But some do not implement AEC at all

T1/E1 RTP/SIP Web Server Reality Check #2: IP Telephony Deployment • Localized deployment at Columbia Univ. Regular phone Conference Server Voicemail Server Telephone Switch/PBX Web based configuration sipd SIP proxy, redirect server SQL database Core Server SIP/PSTN Gateway Server status monitoring IP Phones

Issues and Lessons Learned • PSTN/PBX integration • Requires full understanding of legacy networks • Lower layer (e.g., T1 line configuration) • Parameters must match on both PSTN/PBX and gateway! • PBX access configurations • To ensure calls go through in both directions • Address translation (dial-plan) in both directions • Previous lessons/experiences can help greatly • E.g., second gateway installed in weeks instead of months • Security • Issue: SIP/PSTN gateway has no authentication feature • Solution: • Use gateway’s access control lists to block direct calls • SIP proxy server handles authentication using record-route

Reality Check #3: VoIP Service Availability • Focus on availability rather than traditional QoS • Delay is a minor issue; FEC recovers most isolated losses • Ability to make a call is vital, especially in emergency • Internet measurement sites: • 14 nodes worldwide, not just Internet2 and alike • Definitions: • Availability = MTBF / (MTBF + MTTR) • Availability = successful calls / first call attempts • Equipment availability: 99.999% (“5 nines”)  5 minutes/year • AT&T: 99.98% availability (1997) • IP frame relay SLA: 99.9% • UK mobile phone survey: 97.1-98.8%

First Look of Availability • Call success probability: • 62,027 calls succeeded, 292 failed  99.53% availability • Roughly constant across I2, I2+, commercial ISPs: 99.39-99.58% • Overall network loss • PSTN: once connected, call usually of good quality • exception: mobile phones • Compute % time below loss threshold • 5% loss causes degradation for many codecs • others acceptable till 20%

Network Outages • Sustained packet losses • arbitrarily defined at 8 packets • far beyond recoverable (FEC, interpolation) • 23% packet losses are outages • Make up significant part of 0.25% unavailability • Symmetric: AB  BA • Spatially correlated: AB   AX • Not correlated across networks (e.g., I2 and commercial) • Mostly short (a few seconds), but some are very long (100’s of seconds), make up majority of outage time

Outage-induced Call Abortion Probability • Long interruption  user likely to abandon call • from E.855 survey: P[holding] = e-t/17.26 (t in seconds) •  half the users will abandon call after 12s • 2,566 have at least one outage • 946 of 2,566 expected to be dropped  1.53% of all calls

Summary of Service Availability • Through several metrics, one can translate from network loss to VoIP service availability (no Internet dial-tone) • Current results show availability far below five 9’s, but comparable to mobile telephony • Outage statistics are similar in research and ISP networks • Working on identifying fault sources and locations • Additional measurement sites are welcome

Conclusions • Measuring QoS • Loss burstiness and delay correlation affects (generally worsens) perceived quality • Bridging objective and subjective metrics: the E-model, or speech recognition based MOS prediction • Performance of real products: IP phones and soft clients • Ensuring/improving QoS • Network provisioning (voice traffic aggregation) • Efficient, but may be expensive to deploy and manage • End-to-End (FEC > LBR, PLC) • Easier to deploy, but must control overhead of FEC • Reality Check • Good implementation at the end-point (e.g., IP phones) is vital • VoIP deployment requires PSTN integration and security • Service availability is crucial for VoIP, but still far from 99.999% over the Internet

Ongoing and Future Work • Sampling Internet performance • Where do the problems reside? • Access networks (Cable, DSL), or • International paths? • How can we solve these problems? • Can adaptive FEC react fast enough to changes in network conditions? • Playout delay behaviors of VoIP end-points • How well do they react to jitter, delay spikes?

QoS Measurement and Management for VoIP

QoS Measurement and Management for VoIP

Presentation Transcript

QoS: Don’t try VoIP without it

Performance measurement and management

Evaluation of VoIP QoS over WiBro

Qos Management for VOIP Networks with Edge-to-Edge Admission Control

Scheduling for QoS Management

End to End QoS VoIP

QoS Improvements of VoIP in WiFi Networks

QoS Management

Standardizing for QoS (QoS Questions)

QoS Issues for VoIP over WLAN

A Black-box QoS Measurement Methodology for VoIP End-points

Delivery to VoIP QoS Tools

Ensuring QoS in Your VoIP Development

QoS Measurement and Management for Multimedia Services

The Management Infrastructure of a Network Measurement System for QoS Parameters

Paper Survey on QoS Issues of VoIP

Distributed Measurement of VoIP Parameters

QoS: Don’t try VoIP without it

Ensuring QoS in Your VoIP Development