1 / 18

Linear Predictive Coding for Speech Compression

Linear Predictive Coding for Speech Compression. Dev Ghosh ECE 463. 9 March 2006. Overview. General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction (CELP) Novel Application Sub-band adaptive filtering based on cochlear model.

Télécharger la présentation

Linear Predictive Coding for Speech Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006

  2. Overview • General Model for Speech Synthesis • Channel Vocoder • Linear Predictive Coder (LPC-10) • Code Excited Linear Prediction (CELP) • Novel Application • Sub-band adaptive filtering based on cochlear model

  3. Model for Speech Synthesis • Speech produced by forcing air through vocal cords, larynx, pharynx, mouth and nose • At transmitter speech is divided into segments • Each segment analyzed to determine excitation signal and parameters of vocal tract filter Excitation Source Vocal tract filter Speech

  4. Channel Vocoder - analysis • Each segment of input speech analyzed by a bank of (bandpass) analysis filters • Energy at output of each filter is estimated 50 times a second and transmitted to receiver • Decision made whether segment • voiced /a/, /e/, /o/ or • unvoiced /s/, /f/ • Estimate of pitch period (period of fundamental harmonic) is determined

  5. Voice vs. Unvoiced Speech

  6. Channel vocoder - synthesis • Vocal tract filter implemented by bank of (bandpass) synthesis filters • For voiced segments, periodic pulse generator is input • For unvoiced segments, pseudonoise source is input • Period determined by pitch estimate • Scaled by output of energy estimate • First approach to speech compression

  7. Linear Predictive Coder • Models vocal tract as a single linear filter yn = ∑aiyn-i+Gn • Output: yn, Input: n, Gain: G • Input is random noise (unvoiced) or periodic pulse (voiced) • LPC-10 is a standard (2.4 kb, 8000 Samples/sec)

  8. LPC - Voiced/Unvoiced Decision • Voiced speech has more energy and lower frequency than unvoiced • Speech segment lowpass filtered, energy at output relative to background noise used to determine • Zero-crossings counted to determine frequency • Continuity critereon: voicing decision of neighboring frames taken into account

  9. LPC - Estimating Pitch Period • Extracting pitch from short noisy segment is difficult • One approach is to maximize autocorrelation • Periodicity isn’t strong enough • Threshold can’t be used because maximum value not known in advance

  10. LPC - Estimating Pitch Period • LPC-10 uses average magnitude difference function (AMDF) AMDF(P) =(1/N)∑|yi-yi-P| • If {yn} is periodic with period P0, samples P0 apart will have values close to each other and AMDF will have a min at P0 • AMDF is periodic for voiced and roughly flat for unvoiced • AMDF is min when P is the pitch period and spurious min in unvoiced segments are shallow

  11. LPC - Obtaining Vocal Tract Filter • At transmitter, we want filter coeffs that best match the segment in a mean squared error en2=(yn- ∑aiyn-i+Gn)2 • Autocorrelation approach assumes {yn} is stationary A = R-1P • Recursive solution uses Levinson-Durbin

  12. LPC - Obtaining the Vocal Tract Filter • Covariance approach discards stationarity assumption (not valid for speech signals) cij =E[yn-iyn-j] yields CA = S

  13. LPC - Obtaining the Vocal Tract Filter • cij are estimated as cij = ∑yn-iyn-j • No longer assume values of yn outside of segment are zero • Cholesky decomposition required • Reflection coeffs used to update voicing decision

  14. LPC - Transmitting Parameters • Tenth order filter used for voiced speech and fourth order for unvoiced • Vocal tract filter is sensitive to errors in reflection coeffs close to one gi = (1+ki)/(1-ki) are quantized and sent instead of ki

  15. Code Excited Linear Prediction • Single pulse per pitch period leads to buzzy twang • Variety of excitation signals is allowed • For each segment encoder finds excitation vector that generates synthesized speech that best matches speech being coded

  16. Sub-band adaptive filtering • Multi-channel speech enhancement system • Greater number of sub-bands used, the faster the convergence of the overall system

  17. Cochlear Modelling • Sub-band filters are distributed logarithmically in frequency to approximate distribution of filters in cochlea

  18. Adaptive Noise Cancellation • LMS algorithm is used to model differential transfer function between noise signals in a number of sub-bands • Lower power and shorter filters used in each sub-band • Convergence is equal across all bands if power is distributed equally and filter lengths are the same • Convergence dominated by sub-band with greatest power

More Related