1 / 34

Speech Coding

Speech Coding. Waveform Coding Vocoders Middle Term Evaluation. Waveform Coding. In the time domain PCM Delta PCM (DPCM) Adaptive DPCM In the Frequency Domain Filterbank spectrum Analyser Subband coding Adaptive Transform Coding Vector Waveform Quantisation. Pulse Code Modulation.

inigo
Télécharger la présentation

Speech Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Coding • Waveform Coding • Vocoders • Middle Term Evaluation

  2. Waveform Coding • In the time domain • PCM • Delta PCM (DPCM) • Adaptive DPCM • In the Frequency Domain • Filterbank spectrum Analyser • Subband coding • Adaptive Transform Coding • Vector Waveform Quantisation

  3. Pulse Code Modulation PCM 11010010.... Uniform quantiser Nonuniform quantiser

  4. Uniform Quantiser Each sample of signal is quantized to one of the 2R amplitude values. sn 11 t 10 T=1/Fs 01 00 S(w) 11 01 w 00 wc Fs>2Wc 10 t Rate=RFs Encoding 11010010....

  5. sn t T=1/Fs A-law Nonuniform quantiser m-law 11 10 01 00 11 01 00 11 t Rate=RFs Encoding 11010011....

  6. Delta PCM • Since successive speech samples exhibit high correlation, hence the average of successive-samples amplitude change is very small. • Therefore, by encoding the differences between successive samples, fewer bits are required.

  7. DPCM coder-decoder E n c o d e r e~(n) s(n) + e(n) S Quantiser Goal: Decorralate speech signal. Therefore: A simple long-term LP predictor is enough. - Predictor (LP Analyser) ^s~(n) Channel D e c o d e r e(n) + S s~(n) + ^s~(n) Predictor (LP Analyser)

  8. DPCM coder-decoder(this version ensures that the error in s~(n) is only the quantization error) E n c o d e r Sampled signal modified by the quantisation process. s(n) e~(n) + e(n) S Quantiser + - s~(n) S ^s~(n) Predictor (LP Analyser) + Channel D e c o d e r e(n) + S s~(n) + ^s~(n) Predictor (LP Analyser)

  9. DPCM coder-decoder(improved version) E n c o d e r s(n) e~(n) + e(n) S Quantiser - ^s~(n) ^s~(n) All-zero Linear Filter + + s~(n) S Predictor (LP Analyser) S + + Channel D e c o d e r e(n) + S s~(n) ^s~(n) + s~(n) Predictor (LP Analyser) Predictor (LP Analyser) S + +

  10. Adaptive PCM and DPCM • PCM and DPCM assumes the speech signal is stationary. • The coding process can be improve by assuming the speech is quasi-stationary. • An improvement is to use an adaptive quantiser.

  11. Adaptive DPCM Step-size Adaptation s(n) e~(n) + e(n) E n c o d e r S Quantiser - ^s~(n) All-zero Linear Filter ^s~(n) + + s~(n) S Predictor (LP Analyser) S + + Predictor Adaptator Channel D e c o d e r e(n) + S s~(n) ^s~(n) + s~(n) S Predictor (LP Analyser) Predictor (LP Analyser) + +

  12. Vocoders • Channel vocoder • Cepstral vocoder • Phase vocoder • Formant vocoder • Linear prediction coder.

  13. Cepstral Vocoder E n c o d e r Pitch Estimator s(n) stDFT Log |.| IDFT “Low-time” lifter w(n) Channel D e c o d e r DFT Exp(.) IDFT Convolution Pitch Pulse Generator Noise Generator V/U

  14. Linear Prediction in Speech Coding • Introduction • Generalities • Methods

  15. Introduction • This speech coders are called Vocoders (voice coder). • Basic Idea • They usually provide more bandwidth compression than is possible with waveform coding (2400-9600bps). Transmit Parameters Estimate parameters Encode Parameters Decode Parameters Synthetise Speech

  16. Generalities • LP Model • Parameter Estimation • Typical Memory requirements

  17. LP Model Pitch Period Impulse Generator Voice Speech Signal Voice/Unvoice Switch All-pole filter White Noise Generator Glottal filter Vocal tract filter Lip Radiation filter Unvoice Gain

  18. Parameter Estimation • Therefore, for each frame: • estimate LP coefficients (ai´s) • estimate Gain • estimate type of excitation (voice or unvoice). • Estimate pitch.

  19. Typical Memory Requirements • Pitch coefficient (6 bits). • Gain (5 bits) • Model parameters: • LP coefficients (8-10 bits) • Small changes in the LPC results in large changes in the pole positions. • Reflection coefficients (6 bits) • If |rk| near 1, then large distortion. • Log-Area Ratio: • Represent a non-linear transformation of the Reflection Coefficients to expand the scale near to |rk| near 1.

  20. Methods • Introduction • LPC-10 • Analysis-by-Synthesis

  21. Introduction • The main difference of the LP vocoders is the calculation fo the source of excitation.

  22. E n c o d e r LPC-10 Pitch Frequency (7 bit) AMDF and Zero Crossing Window (180 samples) Voice/Unvoice Switch (1 bit) Speech Signal LP Analysis (Covariance Method) ADC (8kHz) LAR coefficients (4 bits and 5 bits) Non-linear warping Sample Speech Reflection Coefficients (4 bits) Channel D e c o d e r 10 Reflection Coefficients. (5 bits for one and 4 bits for the others). Pitch Period (7 bits) Gain (5 bits) Impulse Generator Voice/Unvoice Switch(1 bit) Synthesized Speech Signal 1/A(z) White Noise Generator

  23. Analysis-by-Synthesis Methods • Introduction • Multipulse LPC Vocoder • Regular Excited Linear Prediction (RELP) • Code Excited Linear Prediction (CELP) Vocoder

  24. Introduction Sampled Speech Buffer and LP Analysis LP Synthesis Filter E n c o d e r Perceptual weighting Filter Multipulse excitation generator Error minimisation

  25. Multipulse LPC vocoder • Multipulse excitation consists of a short sequence of pulses to minimise the energy of the perceptual error. • For simplicity, the amplitude y location of the impulses are obtained sequencially by minimising the energy for one pulse at a time. • In practice 4-8 pulses are calculated every 5 ms.

  26. Multipulse LPC Sampled Speech Buffer and LP Analysis 10-12 Reflection Coefficients (5 bits). E n c o d e r Pitch filter parameters (6 bits) Pitch Synthesis Filter LP Synthesis Filter Perceptual weighting Filter Scale factor (6 bits) Multipulse excitation generator Error minimisation Pulses’ locations (4 bits) Pulses’ amplitude (4 bits) Channel D e c o d e r 10-12 Reflection Coefficients (5 bits). Scale factor (6 bits) Pitch filter parameters (6 bits) Pulses’ locations (4 bits) Excitation Generator Synthesized Speech Signal E(z) 1/A(z) Pulses’ amplitude (4 bits)

  27. Memory Requirments • Updated every 5 ms: • Scale factor(larger amplitude) log quantised: 6 bits • Pulse Amplitude (relative to the larger one) linear quantised: 4 bits. • Updated every 20 ms: • Vocal Tract Parameters (reflection coefficients): 6 bits. • Pitch Period: 6 bits

  28. Effective for good-quality speech at 9600bps. • They have been used for airborne mobile satellite telephone service.

  29. Variations • Every time a new location and amplitude of of an impulse is obtained, one can go back and reoptimise the amplitudes of the previous impulses. • Joint optimisation of all the amplitudes, after all locations have been determined.

  30. Code Excited Linear Prediction (CELP) Vocoder • The excitation signal is selected from a codebook of zero-mean Gaussian sequences. • LP coefficients are calculated around every 20 ms.

  31. CELP Sampled Speech Buffer and LP Analysis 10-12 Reflection Coefficients (5 bits). E n c o d e r Pitch filter parameters (6 bits) Pitch Synthesis Filter LP Synthesis Filter Perceptual weighting Filter Gain factor (6 bits) Gaussian Excitation Codebook Error minimisation Index of the excitation sequence (4 bits) Channel D e c o d e r Pitch filter parameters (6 bits) 10-12 Reflection Coefficients (5 bits). Gain factor (6 bits) Excitation Generator Synthesized Speech Signal E(z) 1/A(z) Pulses’ amplitude (4 bits)

  32. With a codebook of 1024 sequences can be obtain toll-quality speech. • Rate around 4.8Kbps.

  33. Variations • Low-Delay CELP • VSELP

  34. Topics to Evaluate • Vocal Tract Physiological Model. • Linear Prediction (LP). • Relatinoship betwen Vocal Tract Physiological Model and LP. • Filterbank and Signal Processing. • HMM • Basics • Applied to Speech Recognition. • Parameter re-estimation.

More Related