Linear Predictive Coding for Speech Compression

Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006

Overview • General Model for Speech Synthesis • Channel Vocoder • Linear Predictive Coder (LPC-10) • Code Excited Linear Prediction (CELP) • Novel Application • Sub-band adaptive filtering based on cochlear model

Model for Speech Synthesis • Speech produced by forcing air through vocal cords, larynx, pharynx, mouth and nose • At transmitter speech is divided into segments • Each segment analyzed to determine excitation signal and parameters of vocal tract filter Excitation Source Vocal tract filter Speech

Channel Vocoder - analysis • Each segment of input speech analyzed by a bank of (bandpass) analysis filters • Energy at output of each filter is estimated 50 times a second and transmitted to receiver • Decision made whether segment • voiced /a/, /e/, /o/ or • unvoiced /s/, /f/ • Estimate of pitch period (period of fundamental harmonic) is determined

Voice vs. Unvoiced Speech

Channel vocoder - synthesis • Vocal tract filter implemented by bank of (bandpass) synthesis filters • For voiced segments, periodic pulse generator is input • For unvoiced segments, pseudonoise source is input • Period determined by pitch estimate • Scaled by output of energy estimate • First approach to speech compression

Linear Predictive Coder • Models vocal tract as a single linear filter yn = ∑aiyn-i+Gn • Output: yn, Input: n, Gain: G • Input is random noise (unvoiced) or periodic pulse (voiced) • LPC-10 is a standard (2.4 kb, 8000 Samples/sec)

LPC - Voiced/Unvoiced Decision • Voiced speech has more energy and lower frequency than unvoiced • Speech segment lowpass filtered, energy at output relative to background noise used to determine • Zero-crossings counted to determine frequency • Continuity critereon: voicing decision of neighboring frames taken into account

LPC - Estimating Pitch Period • Extracting pitch from short noisy segment is difficult • One approach is to maximize autocorrelation • Periodicity isn’t strong enough • Threshold can’t be used because maximum value not known in advance

LPC - Estimating Pitch Period • LPC-10 uses average magnitude difference function (AMDF) AMDF(P) =(1/N)∑|yi-yi-P| • If {yn} is periodic with period P0, samples P0 apart will have values close to each other and AMDF will have a min at P0 • AMDF is periodic for voiced and roughly flat for unvoiced • AMDF is min when P is the pitch period and spurious min in unvoiced segments are shallow

LPC - Obtaining Vocal Tract Filter • At transmitter, we want filter coeffs that best match the segment in a mean squared error en2=(yn- ∑aiyn-i+Gn)2 • Autocorrelation approach assumes {yn} is stationary A = R-1P • Recursive solution uses Levinson-Durbin

LPC - Obtaining the Vocal Tract Filter • Covariance approach discards stationarity assumption (not valid for speech signals) cij =E[yn-iyn-j] yields CA = S

LPC - Obtaining the Vocal Tract Filter • cij are estimated as cij = ∑yn-iyn-j • No longer assume values of yn outside of segment are zero • Cholesky decomposition required • Reflection coeffs used to update voicing decision

LPC - Transmitting Parameters • Tenth order filter used for voiced speech and fourth order for unvoiced • Vocal tract filter is sensitive to errors in reflection coeffs close to one gi = (1+ki)/(1-ki) are quantized and sent instead of ki

Code Excited Linear Prediction • Single pulse per pitch period leads to buzzy twang • Variety of excitation signals is allowed • For each segment encoder finds excitation vector that generates synthesized speech that best matches speech being coded

Sub-band adaptive filtering • Multi-channel speech enhancement system • Greater number of sub-bands used, the faster the convergence of the overall system

Cochlear Modelling • Sub-band filters are distributed logarithmically in frequency to approximate distribution of filters in cochlea

Adaptive Noise Cancellation • LMS algorithm is used to model differential transfer function between noise signals in a number of sub-bands • Lower power and shorter filters used in each sub-band • Convergence is equal across all bands if power is distributed equally and filter lengths are the same • Convergence dominated by sub-band with greatest power

Linear Predictive Coding for Speech Compression

Linear Predictive Coding for Speech Compression

Presentation Transcript

Speech-Coding Techniques

SPEECH CODING

Speech Processing Project Linear Predictive coding using Voice excited Vocoder

Speech Coding Techniques

Speech Coding

Source Coding-Compression

Linear Predictive Coding for Speech Compression

Speech Coding

Speech Coding Examples

Video Coding For Compression . . . and Beyond

Chapter 6 Linear Predictive Coding (LPC) of Speech Signals

Speech-Coding Techniques

Speech Compression

A Gradient Based Predictive Coding for Lossless Image Compression

“Speech Compression”

Speech Coding Basics

SPEECH COMPRESSION

Speech coding

CODING AND COMPRESSION

Linear Predictive Coding in Mixed-Excitation Linear Predictive Coder - MELP

Linear Prediction Coding of Speech Signal