Speech Coding (Part I)  Waveform Coding

Speech Coding (Part I)  Waveform Coding 虞台文

Content • Overview • Linear PCM (Pulse-Code Modulation) • Nonlinear PCM • Max-Lloyd Algorithm • Differential PCM (DPCM) • Adaptive PCM (ADPCM) • Delta Modulation (DM)

Speech Coding (Part I)  Waveform Coding Overview

Classification of Coding schemes • Waveform coding • Vocoding • Hybrid coding

Quality versus Bitrate of Speech Codecs

Waveform coding • Encode the waveform itself in an efficient way • Signal independent • Offer good quality speech requiring a bandwidth of 16 kbps or more. • Time-domain techniques • Linear PCM (Pulse-Code Modulation) • Nonlinear PCM: -law, a-law • Differential Coding: DM, DPCM, ADPCM • Frequency-domain techniques • SBC (Sub-band Coding) , ATC (Adaptive Transform Coding) • Wavelet techniques

Vocoding • ‘Voice’ + ‘coding’ . • Encoding information about how the speech signal was produced by the human vocal system. • These techniques can produce intelligible communication at very low bit rates, usually below 4.8 kbps. • However, the reproduced speech signal often sounds quite synthetic and the speaker is often notrecognisable. • LPC-10 Codec: 2400 bps American Military Standard.

Hybrid coding • Combining waveform and source coding methods in order to improve the speech quality and reduce the bitrate. • Typical bandwidth requirements lie between4.8 and 16 kbps. • Technique: Analysis-by-synthesis • RELP (Residual Excited Linear Prediction) • CELP (Codebook Excited Linear Prediction) • MPLP (Multipulse Excited Linear Prediction) • RPE (Regular Pulse Excitation)

Quality versus Bitrate of Speech Codecs

Speech Coding (Part I)  Waveform Coding Linear PCM (Pulse-Code Modulation)

Pulse-Code Modulation (PCM) • A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Quantization • A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Linear/Uniform Quantization

Quantization Error/Noise

Quantization Error/Noise overload noise overload noise granular noise

Quantization Error/Noise

 Quantization Step Size Quantization Error/Noise Unquantized sinewave 3-bit quantization waveform 3-bit quantization error 8-bit quantization error

+   +  Quantization Step Size The Model of Quantization Noise

Signal-to-Quatization-Noise Ratio (SQNR) • A measurement of the effect of quantization errors introduced by analog-to-digital conversion at the ADC.

Signal-to-Quatization-Noise Ratio (SQNR) Assume

Signal-to-Quatization-Noise Ratio (SQNR) Assume Is the assumption always appropriate?

Signal-to-Quatization-Noise Ratio (SQNR) Each code bit contributes 6dB. constant The term Xmax/x tells how big a signal can be accurately represented

Signal-to-Quatization-Noise Ratio (SQNR) Determined by A/D converter. Depending on the distribution of signal, which, in turn, depends on users and time.

Signal-to-Quatization-Noise Ratio (SQNR) In what condition, the formula is reasonable?

midtread midrise Overload Distortion

midtread midrise Assume Probability of Distortion

midtread midrise Assume Overload and Quantization Noise withGaussian Input pdf and b=4

Uniform Input Pdf Gaussian Input Pdf Uniform Quantizer Performance

More on Uniform Quantization • Conceptually and implementationally simple. • Imposes norestrictions on signal's statistics • Maintains a constantmaximum error across its total dynamic range. • xvaries so much (order of 40 dB) across sounds, speakers, and input conditions. • We need a quantizing system where the SQNR is independent of the signal’s dynamic range, i.e., a near-constantSQNR across its dynamic range.

Speech Coding (Part I)  Waveform Coding Nonlinear PCM

Probability Density Functionsof Speech Signals Counting the number of samples in each interval provides an estimate of the pdf of the signal.

Probability Density Functionsof Speech Signals

Probability Density Functionsof Speech Signals • Good approx. is a gamma distribution, of the form • Simpler approx. is a Laplacian density, of the form:

Probability Density Functionsof Speech Signals • Distribution normalized so that x=0 and x=1• • Gamma density more closely approximates measured distribution for speech thanLaplacian. • Laplacian is still a good model in analytical studies. • Smallamplitudes much more likely than large amplitudes—by 100:1 ratio.

Companding • The dynamic range of signals is compressed before transmission and is expanded to the original value at the receiver. • Allowing signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability. • Companding reduces the noise and crosstalk levels at the receiver.

Companding Compressor Uniform Quantizer Expander

Companding After compression, yis Nearly uniformly distributed Compressor Uniform Quantizer Expander

The Quantization-Error Variance of Nonuniform Quantizer Compressor Uniform Quantizer Expander Jayant and Noll

Jayant and Noll The Optimal C(x) If the signal’s pdf is known, then the minimum SQNR, is achievable by letting Compressor Uniform Quantizer Expander

Jayant and Noll The Optimal C(x) If the signal’s pdf is known, then the minimum SQNR, is achievable by letting Is the assumption realistic. Compressor Uniform Quantizer Expander

PDF-Independent Nonuniform Quantization Assuming overload free, We require thatSQNRis independent onp(x).

Logarithmic Companding

-Law & A-Law Companding • -Law • A North American PCM standard • Used by North America and Japan • A-Law • An ITU PCM standard • Used by Europe

-Law & A-Law Companding • -Law • A North American PCM standard • Used by North America and Japan • A-Law • An ITU PCM standard • Used by Europe (=255  in U.S. and Canada) (A=87.56  in Europe)

-Law & A-Law Companding

-Law Companding

Speech Coding (Part I)  Waveform Coding