Speech Coding

Speech Coding • Waveform Coding • Vocoders • Middle Term Evaluation

Waveform Coding • In the time domain • PCM • Delta PCM (DPCM) • Adaptive DPCM • In the Frequency Domain • Filterbank spectrum Analyser • Subband coding • Adaptive Transform Coding • Vector Waveform Quantisation

Pulse Code Modulation PCM 11010010.... Uniform quantiser Nonuniform quantiser

Uniform Quantiser Each sample of signal is quantized to one of the 2R amplitude values. sn 11 t 10 T=1/Fs 01 00 S(w) 11 01 w 00 wc Fs>2Wc 10 t Rate=RFs Encoding 11010010....

sn t T=1/Fs A-law Nonuniform quantiser m-law 11 10 01 00 11 01 00 11 t Rate=RFs Encoding 11010011....

Delta PCM • Since successive speech samples exhibit high correlation, hence the average of successive-samples amplitude change is very small. • Therefore, by encoding the differences between successive samples, fewer bits are required.

DPCM coder-decoder E n c o d e r e~(n) s(n) + e(n) S Quantiser Goal: Decorralate speech signal. Therefore: A simple long-term LP predictor is enough. - Predictor (LP Analyser) ^s~(n) Channel D e c o d e r e(n) + S s~(n) + ^s~(n) Predictor (LP Analyser)

DPCM coder-decoder(this version ensures that the error in s~(n) is only the quantization error) E n c o d e r Sampled signal modified by the quantisation process. s(n) e~(n) + e(n) S Quantiser + - s~(n) S ^s~(n) Predictor (LP Analyser) + Channel D e c o d e r e(n) + S s~(n) + ^s~(n) Predictor (LP Analyser)

DPCM coder-decoder(improved version) E n c o d e r s(n) e~(n) + e(n) S Quantiser - ^s~(n) ^s~(n) All-zero Linear Filter + + s~(n) S Predictor (LP Analyser) S + + Channel D e c o d e r e(n) + S s~(n) ^s~(n) + s~(n) Predictor (LP Analyser) Predictor (LP Analyser) S + +

Adaptive PCM and DPCM • PCM and DPCM assumes the speech signal is stationary. • The coding process can be improve by assuming the speech is quasi-stationary. • An improvement is to use an adaptive quantiser.

Adaptive DPCM Step-size Adaptation s(n) e~(n) + e(n) E n c o d e r S Quantiser - ^s~(n) All-zero Linear Filter ^s~(n) + + s~(n) S Predictor (LP Analyser) S + + Predictor Adaptator Channel D e c o d e r e(n) + S s~(n) ^s~(n) + s~(n) S Predictor (LP Analyser) Predictor (LP Analyser) + +

Vocoders • Channel vocoder • Cepstral vocoder • Phase vocoder • Formant vocoder • Linear prediction coder.

Cepstral Vocoder E n c o d e r Pitch Estimator s(n) stDFT Log |.| IDFT “Low-time” lifter w(n) Channel D e c o d e r DFT Exp(.) IDFT Convolution Pitch Pulse Generator Noise Generator V/U

Linear Prediction in Speech Coding • Introduction • Generalities • Methods

Introduction • This speech coders are called Vocoders (voice coder). • Basic Idea • They usually provide more bandwidth compression than is possible with waveform coding (2400-9600bps). Transmit Parameters Estimate parameters Encode Parameters Decode Parameters Synthetise Speech

Generalities • LP Model • Parameter Estimation • Typical Memory requirements

LP Model Pitch Period Impulse Generator Voice Speech Signal Voice/Unvoice Switch All-pole filter White Noise Generator Glottal filter Vocal tract filter Lip Radiation filter Unvoice Gain

Parameter Estimation • Therefore, for each frame: • estimate LP coefficients (ai´s) • estimate Gain • estimate type of excitation (voice or unvoice). • Estimate pitch.

Typical Memory Requirements • Pitch coefficient (6 bits). • Gain (5 bits) • Model parameters: • LP coefficients (8-10 bits) • Small changes in the LPC results in large changes in the pole positions. • Reflection coefficients (6 bits) • If |rk| near 1, then large distortion. • Log-Area Ratio: • Represent a non-linear transformation of the Reflection Coefficients to expand the scale near to |rk| near 1.

Methods • Introduction • LPC-10 • Analysis-by-Synthesis

Introduction • The main difference of the LP vocoders is the calculation fo the source of excitation.

E n c o d e r LPC-10 Pitch Frequency (7 bit) AMDF and Zero Crossing Window (180 samples) Voice/Unvoice Switch (1 bit) Speech Signal LP Analysis (Covariance Method) ADC (8kHz) LAR coefficients (4 bits and 5 bits) Non-linear warping Sample Speech Reflection Coefficients (4 bits) Channel D e c o d e r 10 Reflection Coefficients. (5 bits for one and 4 bits for the others). Pitch Period (7 bits) Gain (5 bits) Impulse Generator Voice/Unvoice Switch(1 bit) Synthesized Speech Signal 1/A(z) White Noise Generator

Analysis-by-Synthesis Methods • Introduction • Multipulse LPC Vocoder • Regular Excited Linear Prediction (RELP) • Code Excited Linear Prediction (CELP) Vocoder

Introduction Sampled Speech Buffer and LP Analysis LP Synthesis Filter E n c o d e r Perceptual weighting Filter Multipulse excitation generator Error minimisation

Multipulse LPC vocoder • Multipulse excitation consists of a short sequence of pulses to minimise the energy of the perceptual error. • For simplicity, the amplitude y location of the impulses are obtained sequencially by minimising the energy for one pulse at a time. • In practice 4-8 pulses are calculated every 5 ms.

Multipulse LPC Sampled Speech Buffer and LP Analysis 10-12 Reflection Coefficients (5 bits). E n c o d e r Pitch filter parameters (6 bits) Pitch Synthesis Filter LP Synthesis Filter Perceptual weighting Filter Scale factor (6 bits) Multipulse excitation generator Error minimisation Pulses’ locations (4 bits) Pulses’ amplitude (4 bits) Channel D e c o d e r 10-12 Reflection Coefficients (5 bits). Scale factor (6 bits) Pitch filter parameters (6 bits) Pulses’ locations (4 bits) Excitation Generator Synthesized Speech Signal E(z) 1/A(z) Pulses’ amplitude (4 bits)

Memory Requirments • Updated every 5 ms: • Scale factor(larger amplitude) log quantised: 6 bits • Pulse Amplitude (relative to the larger one) linear quantised: 4 bits. • Updated every 20 ms: • Vocal Tract Parameters (reflection coefficients): 6 bits. • Pitch Period: 6 bits

Effective for good-quality speech at 9600bps. • They have been used for airborne mobile satellite telephone service.

Variations • Every time a new location and amplitude of of an impulse is obtained, one can go back and reoptimise the amplitudes of the previous impulses. • Joint optimisation of all the amplitudes, after all locations have been determined.

Code Excited Linear Prediction (CELP) Vocoder • The excitation signal is selected from a codebook of zero-mean Gaussian sequences. • LP coefficients are calculated around every 20 ms.

CELP Sampled Speech Buffer and LP Analysis 10-12 Reflection Coefficients (5 bits). E n c o d e r Pitch filter parameters (6 bits) Pitch Synthesis Filter LP Synthesis Filter Perceptual weighting Filter Gain factor (6 bits) Gaussian Excitation Codebook Error minimisation Index of the excitation sequence (4 bits) Channel D e c o d e r Pitch filter parameters (6 bits) 10-12 Reflection Coefficients (5 bits). Gain factor (6 bits) Excitation Generator Synthesized Speech Signal E(z) 1/A(z) Pulses’ amplitude (4 bits)

With a codebook of 1024 sequences can be obtain toll-quality speech. • Rate around 4.8Kbps.

Variations • Low-Delay CELP • VSELP

Topics to Evaluate • Vocal Tract Physiological Model. • Linear Prediction (LP). • Relatinoship betwen Vocal Tract Physiological Model and LP. • Filterbank and Signal Processing. • HMM • Basics • Applied to Speech Recognition. • Parameter re-estimation.

Speech Coding

Speech Coding

Presentation Transcript

Speech-Coding Techniques

SPEECH CODING

Speech Coding Techniques

Multiple Description Speech Coding

Speech & Audio Coding

Speech Coding

Basics of speech coding

Speech Coding Using LPC

Speech Coding EE 516 Spring 2009

A Recognition Model for Speech Coding

Linear Predictive Coding for Speech Compression

Speech Coding Examples

Speech and Audio Processing and Coding

Speech-Coding Techniques

What is speech coding?

Speech Coding Basics

Speech coding

Speech Coding (Part I)  Waveform Coding

Speech and Audio Coding

Frequency Domain Coding of Speech

Linear Predictive Coding for Speech Compression

Scalable Speech Coding for IP Networks

Speech Coding

Speech Coding

Presentation Transcript

Speech-Coding Techniques

SPEECH CODING

Speech Coding Techniques

Multiple Description Speech Coding

Speech &amp; Audio Coding

Speech Coding

Basics of speech coding

Speech Coding Using LPC

Speech Coding EE 516 Spring 2009

A Recognition Model for Speech Coding

Linear Predictive Coding for Speech Compression

Speech Coding Examples

Speech and Audio Processing and Coding

Speech-Coding Techniques

What is speech coding?

Speech Coding Basics

Speech coding

Speech Coding (Part I)  Waveform Coding

Speech and Audio Coding

Frequency Domain Coding of Speech

Linear Predictive Coding for Speech Compression

Scalable Speech Coding for IP Networks

Speech & Audio Coding