Download
speech audio processing n.
Skip this Video
Loading SlideShow in 5 Seconds..
Speech & Audio Processing PowerPoint Presentation
Download Presentation
Speech & Audio Processing

Speech & Audio Processing

164 Vues Download Presentation
Télécharger la présentation

Speech & Audio Processing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Speech & Audio Processing Speech & Audio Coding Examples

  2. A Simple Speech Coder • LPC Based Analysis Structure Linear Prediction Analysis Levinson-Durbin Pre-emphasis WindowingAnalysis Auto-Correlation AudioInput Residual Residual AnalysisFilter Quantization Filter Coeffs Filter Coeffs Veton Këpuska

  3. Windowing Analysis Stage N – Length of the Analysis Window 10-30 msec Veton Këpuska

  4. Some Analysis Windows Veton Këpuska

  5. MATLAB Useful Functions • wintool • Use “doc wintool” for more information • window • Use “>doc window” for the list of supported windows • Define your own window if needed e.g: • Sine window and Vorbis window Veton Këpuska

  6. LPC Analysis Stage • LPC Method Described in: • Ch5-Analysis_&_Synthesis_of_Pole-Zero_Speech_Models.ppt • Summary: • Perform Autocorrelation • Solve system of equations with Durbin-Levinson Method • MATLAB help • doc lpc, etc. Veton Këpuska

  7. Example of MATLAB Code function myLPCCodec(wavfile, N) % % wavfile - input MS wav file % N - LPC Filter Order % [x, fs, nbits] = wavread(wavfile); % plot(x); % Playing Original Signal soundsc(x,fs); % Performing LPC analysis using MATLAB lpc function [a, g] = lpc(x,N); % performing filtering operation on estimated filter coeffs % producing predicted samples est_x = filter([0 -a(2:end)], 1, x); % error signal e = x - est_x; % Testing the quality of predicted samples soundsc(est_x, fs); % Synthesis Stage With Zero Loss of Information syn_x = filter([0 -a(2:end)], 1, g.*e); soundsc(syn_x,fs); ŝ[n] ge[n] Veton Këpuska

  8. Analysis of Quantization Errors • Use MATLAB functions to research the effects of quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: • Double (float64) representation (software emulation) • Float (float32) representation (software emulation) • Int (int32) representation (hardware emulation) • Short (int16) representation (hardware emulation). • Useful MATLAB functions: • Fix, floor, round, ceil • Example: • sig_hat=fix(sig*2^(B-1))/2^(B-1); • Truncation of the sig to B bits. Veton Këpuska

  9. Quantization of Error Signal & Filter Coefficients • Can Apply ADPCM for Error Signal • Filter Coefficients in the Direct Filter Form are found to be sensitive to quantization errors: • Small quantization error can have a large effect on filter characteristics. • Issue is that polynomial coefficients have non-linear mapping to poles of the filter (e.g., roots of the polynomial). • Alternate representations possible that have significantly better tolerance to quantization error. Veton Këpuska

  10. LPC Filter Representations • As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients: • LPC to PARCOR: Veton Këpuska

  11. PARCOR Filter Representation • PARCOR to LPC: Veton Këpuska

  12. Line Spectral Frequency Representation • It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties. • Note that: • The PARCOR lattice structure of the LPC synthesis filter above: Input Output Ap-1 A0 Ap + + kp kp-1 k0=-1 kp+1=∓1 - - z-1 z-1 z-1 Bp-1 B0 Bp Veton Këpuska

  13. Line Spectral Frequency Representation • From previous slide the following holds: • From this realization of the filter the LSP representation is derived: Veton Këpuska

  14. LSF Representation Veton Këpuska

  15. LPC Synthesis Filter with LSF Veton Këpuska

  16. A Simple Speech Coder • LPC Based Synthesis Structure ResidualSignal De-emphasis SynthesisFilter AudioOutput Residual Decoding Filter Coeffs FilterCoeffs Veton Këpuska

  17. Audio Coding

  18. Audio Coding • Most of the Audio Coding Standards use principles of Psychoacoustics. • Example of Basic Structure of MP3 encoder: AudioInput Bit-stream Filterbank &Transform Quantization PsychoacousticModel Veton Këpuska

  19. Basic Structure of Audio Coders • Filterbank Processing • Psychoacoustic Model • Quantization Veton Këpuska

  20. Filter Bank Analysis Synthesis

  21. Filterbank Processing: • Splitting full-band signal into several sub-bands: • Uniform sub-bands (FFT) • Critical Band (FFT followed by non-linear transformation) • Reflect Human Auditory Apparatus. • Mel-Scale and Bark-Scale transformations Veton Këpuska

  22. Mel-Scale Veton Këpuska

  23. Bark-Scale Veton Këpuska

  24. Analysis Structure of Filterbank hk[n] – Impulse Response of a Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↓ - Down-sampling MDCT – Modified Discrete Cosine Transform h1[n] ↓ MDCT MDCT Bit Stream AudioInput hk[n] ↓ MDCT MDCT Quantization hN[n] ↓ MDCT MDCT Veton Këpuska

  25. Analysis Structure of Filterbank gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↑ - Up-sampling IMDCT – Inverse Modified Discrete Cosine Transform IMDCT ↑ g1[n] MDCT Bit Stream AudioOutput IMDCT ↑ gk[n] MDCT Decoding IMDCT ↑ gN[n] MDCT Veton Këpuska

  26. Psycho-Acoustic Modeling

  27. Psychoacoustic Model • Masking Threshold according to the human auditory perception. • Masking threshold is used to quantize the Discrete Cosine Transform Coefficients • Analysis is done in frequency domain represented by DFT and computed by FFT. Veton Këpuska

  28. Threshold of Hearing • Absolute threshold of audibly perceptible events in quiet conditions (no other sounds). • Any signal below the threshold can be removed without effect on the perception. Veton Këpuska

  29. Threshold of Hearing Veton Këpuska

  30. Frequency Masking • Schröder Spreading Function • Bark Scale Function: Veton Këpuska

  31. Masking Curve Veton Këpuska

  32. Primary Tone 1kHz Veton Këpuska

  33. Masked Tone 900 Hz Veton Këpuska

  34. Combined Sound 1kHz + 0.9kHz Veton Këpuska

  35. Combined 1kHz + 0.9kHz (-10dB) Veton Këpuska

  36. Combined 1kHz + 5kHz (-10dB) Veton Këpuska