1 / 25

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec. Presented by Peter. AMR Narrow Band. Adaptive Multi-Rate Codec for narrow band speech (AMR-NB) Specified by 3GPP for GSM/3G Systems Input: 8 kHz sampling rate, 13-bit PCM 20 ms frames, no overlap 8 modes + Comfort noise

Télécharger la présentation

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec Presented by Peter

  2. AMR Narrow Band • Adaptive Multi-Rate Codec for narrow band speech (AMR-NB) • Specified by 3GPP for GSM/3G Systems • Input: 8 kHz sampling rate, 13-bit PCM • 20 ms frames, no overlap • 8 modes + Comfort noise • Output bitrate from 4.75 – 12.2 kbps • Algebraic Code Excited Linear Prediction (ACELP) is used as speech codec

  3. Frequency Response

  4. Speech Encoder • Pre-processing • Linear prediction analysis and quantization • Open-loop pitch analysis • Impulse response computation • Target signal computation • Adaptive codebook • Algebraic codebook • Quantization of the adaptive and fixed codebook gains • Memory update

  5. Principles of the adaptive multi-rate speech encoder • Eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s • 10th order linear prediction (LP), or short‑term, synthesis filter is used which is given by • The long‑term, or pitch, synthesis filter is given by • The pitch synthesis filter is implemented using adaptive codebook approach

  6. ACELP

  7. Pre-Processing • Two pre‑processing functions • high‑pass filtering • signal down‑scaling – prevent overflow • A filter with a cut off frequency of 80 Hz is used

  8. Linear Prediction Analysis • Frame is spit into four sub-frames • 12.2 kbit/s mode • Performed twice per frame • 30ms asymmetric window • No lookahead • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s • Performed once per frame • 30ms asymmetric window • 5ms lookahead

  9. Windowing and Auto-correlation Computation • 12.2 kbit/s mode • Two different asymmetric windows • 1st window concentrates on 2nd sub-frame • 2nd window concentrates on 4th sub-frame

  10. Windowing and Auto-correlation Computation • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s • One asymmetric windows • Concentrates on 4th sub-frame • 5ms (40 samples) lookahead

  11. Auto-correlation Computation • Lag 0 to 10 is computed • is the windowed speech • 60 Hz bandwidth expansion is used by lag windowing • is multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at ‑40 dB

  12. Levinson‑Durbin algorithm • by solving the set of equations • uses the following recursion: • The final solution is given as

  13. LP to LSP conversion • The LP filter coefficients, are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes • The LSPs are defined as the roots of the sum and difference polynomials • All roots of these polynomials are on the unit circle and they alternate each other • z=-1 and 1 are eliminated

  14. LP to LSP conversion

  15. Quantization of the LSP coefficients • 12.2 kbit/s mode • Two sets of LSP are quantified using the representation in the frequency domain • 1st order MA prediction is applied • two residual LSF vectors are jointly quantified using split matrix quantization (SMQ) • weighted LSP distortion measure is used in the quantization process • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes • 1st order MA prediction is applied • residual LSF vector is quantified using split vector quantization • weighted LSP distortion measure

  16. Interpolation of the LSPs • 12.2 kbit/s mode • interpolated LSP vectors at the 1st and 3rd subframes are given by • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes • interpolated LSP vectors at the 1st, 2nd, and 3rd subframes are given by

  17. original weighted unit circle Open‑loop pitch analysis • Performed twice per frame (each 10 ms) for 12.2k, 10.2k, 7.95k, 7.40, 6.70k, 5.90k bit/s modes • Performed once per frame for 5.15k, 4.75k bit/s modes • Filtering the pre-processed signal with a perceptual weighting filter Flat: Tilted:

  18. Impulse response computation • The impulse response, h(n) is computed each subframe • For the search of adaptive and fixed codebooks • Computed by filtering the vector of coefficients of the filter extended by zeros through the two filters and

  19. Adaptive codebook • Adaptive codebook search is performed on a subframe basis • The parameters are the delay and gain of the pitch filter • The codebook contain entries taken from the previously synthesized excitation signal

  20. Algebraic codebook • Encode the random portion of the excitation signal • The periodic portion of the weighted residual is first removed. Only the random portion is remained to be coded by fixed codebook • Codebook search by minimize error between perceptual weighted input speech and reconstructed speech • Based on interleaved single-pulse permutation (ISPP) design • A few sparse impulse sequence that are phase-shifted version of each other • All the pulses have the same magnitude • Amplitudes are +1 or -1

  21. Speech decoder • Codebook parameter are decoded by table look up • LSP coefficients are interpolated and converted to LP coefficients • Excitation = sum of adaptive and fixed codebook vectors multiplied by their respective gains in each subframe • Speech = excitation through vocal tract filter. • Enhanced perceived quality by adaptive post-filtering.

  22. Speech decoder

  23. Synthesis model

  24. Synthesis model • To reconstruct speech • A noise-like speech • A pitch filter model of the glottal vibrations • A linear prediction filter model of the vocal tract

  25. Post‑processing • Adaptive post-filtering • Cascade of two filters: a format postfilter and a tilt compensation filter • Updated every subframe of 5 ms • High-pass filter • Against undesired low frequency components • Cut-off frequency of 60 Hz is used • Up-scaling by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal

More Related