University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk

New Models for Perceived Voice Quality Prediction and their Applications in Playout Buffer Optimization for VoIP Networks Dr. Lingfen Sun Prof Emmanuel Ifeachor University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk

Outline • Background • Speech quality for VoIP networks • Current status • Aims of the project • Main Contributions • Novel non-intrusive voice quality prediction models • Novel perceptual-based speech quality optimization (e.g. jitter buffer optimization) mechanism • Conclusions and Future Work

Background – Speech Quality for VoIP Networks IP Network SCN SCN MOS • VoIP speech quality: end-user perceived quality(MOS), an important metric. • Affected by IP network impairments and other impairments. • Voice quality measurement: subjective (MOS ) or objective (intrusive or non-intrusive) Reference speech Degraded speech Intrusive measurement Gateway Gateway SCN: Switched Comm. Networks (PSTN, ISDN, GSM …) Non-intrusive measurement MOS End-to-end Perceived speech quality

Current Status and Problems • Lack of an efficient non-intrusive speech quality measurement method • E-model (a complicated computational model) • Based on subjective tests to derive models/parameters, time-consuming and expensive. Only limited models exist • Lack of perceptual optimization control methods • only based on individual network parameters for buffer optimization and QoS control purposes • not perceptual-based optimization control

Aims of the Project IP Network End-to-end perceived voice quality (MOS) Decoder De- packetizer Jitter buffer Encoder Packetizer Voice receiver Voice source Receiver Sender Non-intrusive measurement MOS • To develop novel and efficient method/models for non-intrusive quality prediction, • To apply the models for perceptual-based optimization control ( e.g. buffer optimization or adaptive sender-bit-rate QoS control).

Novel Non-intrusive Voice Quality Prediction VoIP Network • Based on intrusive quality measurement (e.g. PESQ) to predict voice quality non-intrusively which avoids subjective tests. • A generic method which can be applied to audio, image and video. Intrusive method MOS(PESQ) Measured MOSc E-model delay PESQ Reference speech Degraded speech (packet loss, delay, codec …) Non-intrusive method New model (regression or ANN models) Predicted MOSc

New Structure to Obtain MOSc MOS (PESQ) Reference speech Ie PESQ MOS  R  Ie E-model MOSc Degraded speech Delay model End-to-end delay Id • PESQ can only predict one-way listening speech quality (expressed as MOS). • By a new combined PESQ/E-model structure, a conversational speech quality (MOSc) can be obtained as Measured MOSc.

Regression based Models (1) Ie Codec • Nonlinear regression models are derived for Ie based on PESQ/PESQ-LQ • Further combine Ie with Id to obtain MOSc. Ie model E-model Packet loss MOSc Id model Delay (d) Id (a) MOS (PESQ) PESQ/ PESQ-LQ MOS RIe Measured Ie Speech database Encoder Decoder Loss model Degraded speech Reference speech Nonlinear regression model (Ie model) Predicted Ie (b)

Regression based Models (2) • Ie can be modelled by a logarithm fitting function with the form of • Parameters for different codecs (PESQ)

Regression Models for AMR (12.2Kb/s) e.g.for AMR (12.2Kb/s), The goodness of fit is: SSE = 2.83 and R2 = 0.998 MOS vs. packet loss and delay

Perceptual-based Buffer Optimization • Motivation: • only based on individual network parameters (e.g. delay or loss) • targeting only minimum average delay or minimum late arrival loss, not maximum MOS. • There is a need to design buffer algorithm to achieve optimum perceived speech quality. • Contribution • A perceptual-based optimization jitter buffer algorithm • Use regression based models for buffer optimization • Use a minimum impairment criterion instead of traditional maximum MOS score • A Weibull delay distribution based on trace analysis • A perceptual-based optimization of playout buffer algorithm

Impairment Function Im • Define: impairment function Im Weilbull distribution buffer loss b Playout delay d

Minimum Impairment Criterion • Define: minimum impairment criterion Given: network delay dn, network loss n and codec type Estimate: an optimized playout delay dopt Such that: minimize Imcan be reached. d1 d2 d3 d4 Minimum Im

Perceptual-based Optimization Buffer Algorithm • For every packet i received, calculate network delay ni • If mode == SPIKE then • if ni  tail*old_d then • mode = NORMAL • elseif ni > head*dithen • mode = SPIKE; old_d = di • else • update delay records for the past W packets • endif • At the beginning of a talkspurt • If mode == SPIKE then • di = ni • else • obtain (, , ) for Weilbull distribution for the past W packets • search playout d which meets minimum Im criterion • endif

Performance Analysis and Comparison (1) • Selected five traces from UoP to CU (USA), DUT (Germany), BUPT (China), and NC (China). • Traces 1 and 3 with high delay variation and traces 2, 4, 5 with low delay variation

Performance Analysis and Comparison (2) • “p-optimum” algorithm achieves the optimum voice quality for all traces. • “adaptive” algorithm achieves sub-optimum quality with low complexity.

Conclusions and Future Work • Conclusions • The development of a new methodology and regression models to predict voice quality non-intrusively. • Demonstrated the application of new non-intrusive voice quality prediction models to perceptual-based optimization of playout buffer algorithms. • Future Work • To consider buffer adaptation during a talkspurt in order to achieve the best trade-off between delay, loss and end-to-end jitter. • To extend the work to improve the performance of multimedia services (e.g. audio/image/video) over IP networks

Contact Details • http://www.tech.plymouth.ac.uk/spmc • Dr. Lingfen Sun L.Sun@plymouth.ac.uk • Prof Emmanuel Ifeachor E.Ifeachor@plymouth.ac.uk • Any questions? Thank you!

University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk