1 / 45

Introduction to Speech Signal Processing

Introduction to Speech Signal Processing. Dr. Zhang Sen zhangsen@gscas.ac.cn Chinese Academy of Sciences Beijing, China 2014/9/10. Introduction Sampling and quantization Speech coding Features and Analysis Main features Some transformations Text-to-Speech State of the art

lonato
Télécharger la présentation

Introduction to Speech Signal Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Speech Signal Processing Dr. Zhang Sen zhangsen@gscas.ac.cn Chinese Academy of Sciences Beijing, China 2014/9/10

  2. Introduction • Sampling and quantization • Speech coding • Features and Analysis • Main features • Some transformations • Text-to-Speech • State of the art • Main approaches • Speech-to-Text • State of the art • Main approaches • Applications • Human-machine dialogue systems

  3. View speech signal in math. Can be described by continuous function, but Hard to find explicit analytical form Non-linear Non-stationary, time-varying Some parts like noise Some parts like pseudo-periodic signal View speech signal in physics Wave generated by vibration Transmitted in air/media

  4. Analysis approaches Divide-and-conquer Approximation and simplicity Transformation (TD-FD) Analysis purpose To find speech features Which are important, which are trivial Correlation between features How features change? How to to change original signal

  5. Features can be classified as Time-domain features Frequency-domain features Or Short-term features Long-term features Feature representation Numerical: Vector or Distribution Diagram: curve or image

  6. Windowing (frame) In short-term, non-stationary->stationary and Non-linear->linear (10ms-25ms)

  7. Window types

  8. Window shapes

  9. A few words on Window function

  10. Commonly used speech features Zero-crossing-rate (ZCR) Peaks Power and energy Correlation, auto-correlation, AMDF Formant Pitch Frequency spectrum Cepstrum and MFCC Linear Predictive Coefficients (LPC), LPCC

  11. ZCR

  12. Level-crossing-rate

  13. Peaks

  14. Power and energy

  15. Correlation, auto-correlation, AMDF To measure the similarity of two signals or to detect the periodicity of a signal Sum x(k+i)*x(k+m+i) in a range, where k is the reference point and m is the lags

  16. Center-clipping technique

  17. Auto-correlation peaks

  18. Auto-correlation show

  19. Formant LPC->FFT

  20. Formant displays

  21. Some typical formant values

  22. Pitch, fundamental frequency Referred to as F0, determine tone and prosody Pitch estimation methods Auto-correlation and AMDF Cepstrum LPC Peak detection Pitch smoothing methods Dynamic programming N-point smoothing filter HMM

  23. Pitch show The pitch of a3 by auto-correlation method

  24. Spectrogram Representation of a signal highlighting several of its properties based on short-time Fourier analysis Two dimensional: time horizontal and frequency vertical Third ‘dimension’: gray or color level indicating energy

  25. Spectrum of a frame (vowel)

  26. Spectrum of a frame (consonant)

  27. Cepstrum analysis

  28. Cepstrum and MFCC computation s(n) DFT log|DFT| IDFT MFCC Filter-bank DCT cepstrum

  29. Filter-bank

  30. Perceptual measures

  31. Linear predictive analysis

  32. Prediction errors

  33. LP coefficients to cepstral coefficients The computation of LPCC LPCC is often used in ASR as feature vector

  34. Some transformations in SSP DFT, FFT, DCT and their inverses Frequency analysis TD-FD conversion Z transformation LPC analysis Filter design Wavelet transformation Frequency analysis Compression

  35. Fourier Transform

  36. Discrete Fourier Transform The computation load of DFT is O(N2), the Fast Discrete Fourier Transform reduced it to O(NlogN) based on divide-and-conquer principle

  37. Basic Phonetic knowledge Consonant/unvoiced Vowel/voiced Co-articulation Phone and phoneme Uni-, bi-, tri-phone Canonical form, surface form, reduced form Tone and prosody

  38. Co-articulation Very common in English, it causes many difficulties in ASR In Mandarin, not very serious The use of bi-phones and tri-phones intend to cope with this issue. Some examples: Mandarin: A yi, yi yi, wu yun, … English: this issue, in a box, …

  39. Some research topics Speech signal detection, endpoint detection Consonant/vowel separation Pitch estimation Echo cancellation De-noise and filter design Multi-signal separation Robust features Perceptual features Re-sampling and re-construction etc

  40. Speech & Language Processing Jurafsky & Martin -Prentice Hall - 2000 Spoken Language Processing X.. D. Huang, al et, Prentice Hall, Inc., 2000 Statistical Methods for Speech Recognition Jelinek - MIT Press - 1999 Foundations of Statistical Natural Language Processing Manning & Schutze - MIT Press - 1999 Fundamentals of Speech Recognition L. R. Rabiner and B. H. Juang, Prentice-Hall, 1993 Dr. J. Picone - Speech Website www.isip.msstate.edu References

  41. Mode A final 4-page report or A 30-min presentation Content Review of speech processing Speech features and processing approaches Review of TTS or ASR Audio in computer engineering Test

  42. THANKS

More Related