1 / 19

Gammachirp Auditory Filter

Gammachirp Auditory Filter. Alex Park May 7 th , 2003. Project Overview. Goal: Investigate use of (non-linear) auditory filters for speech analysis Background: Sound analysis in auditory periphery similar to wavelet transform Comparison: Traditional Short-Time Fourier analysis

jarrett
Télécharger la présentation

Gammachirp Auditory Filter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gammachirp Auditory Filter Alex Park May 7th, 2003

  2. Project Overview • Goal: • Investigate use of (non-linear) auditory filters for speech analysis • Background: • Sound analysis in auditory periphery similar to wavelet transform • Comparison: • Traditional Short-Time Fourier analysis • Gammatone wavelet based analysis (auditory filter) • Extension: • Gammachirp filter has level-dependent parameters which can model non-linear characteristics of auditory periphery • Implementation: • Specifics of Gammachirp implementation • How to incorporate level dependency

  3. Auditory Physiology • Sound pressure variation in the air is transduced through the outer and middle ears onto end of cochlea • Basilar membrane which runs throughout the cochlea maps place of maximal displacement to frequency Outer ear Auditory Nerve Cochlea Middle ear Low freq (200 Hz) Cortex High freq (20 kHz) Basilar Membrane

  4. Motivation – Why better auditory models? • Automatic Speech Recognition (ASR) • ASR systems perform adequately in ‘clean’ conditions • Robustness is a major problem; degradation in low SNR conditions is much worse than humans • Hearing research • Build better hearing aids and cochlear implants • Hearing impaired subjects with damaged cochlea have trouble understanding speech in noisy environments • Current hearing aids perform linear amplification, amplify noise as well as the signal • Is the lack of compressive non-linearity in the front-end a common link?

  5. /t/ /ae/ /s/ tone transient noise Non-stationary Nature of Speech • Why is speech a good candidate for local frequency analysis? Waveform of the word “tapestry”

  6. FFT Power Time-Frequency Representation • The most common way of representing changing spectral content is the Short Time Fourier Transform (STFT)

  7. Spectrogram from STFT “tapestry”

  8. Freq (Hz) STFT Characteristics • We can think of the STFT as filtering using the following basis • In the frequency domain, we are using a filterbank consisting of linearly spaced, constant bandwidth filters

  9. Auditory Filterbanks • Unlike the STFT, physiological data indicates that auditory filters: • are spaced more closely at lower freq than at high freq • have narrower bandwidths at lower frequencies (constant-Q) • The Gammatone filter bank proposed by Patterson, models these characteristics using a wavelet transform. • The mother wavelet, or kernel function, is Tone carrier Gamma Envelope

  10. Freq (Hz) Gammatone Characteristics • Unlike the STFT, the Gammatone filterbank uses the following basis • The corresponding frequency responses are

  11. What are we missing? • The Gammatone filterbank has constant-Q bandwidths and logarithmic spacing of center frequencies • Also, Gamma envelope guarantees compact support • But, the filters are 1) symmetric and 2) linear • Psychophysical experiments indicate that auditory filter shapes are: 1) Asymmetric • Sharper drop-off on high frequency side 2) Non-linear • Filter shape and gain change depending on input level • Compressive non-linearity of the cochlea • Important for hearing in noise and for dynamic range

  12. Gammachirp Characteristics • The Gammachirp filter developed by Irino & Patterson uses a modified version of the Gammatone kernel Chirp term Gamma Envelope Tone carrier • Frequency response is asymmetric, can fit passive filter • Level-dependent parameters can fit changes due to stimulus

  13. Implementation • Looking in the frequency domain, the Gammachirp can be obtained by cascading a fixed Gammatone filter with an asymmetric filter • To fit psychophysical data, a fixed Gammachirp is cascaded with level-dependent asymmetric IIR filters

  14. Comparison: Tone vs. Passive Chirp outputs • Gammatone output seems to have better frequency res. • Passive Gammachirp output seems to have better time res.

  15. Comparison: Tone vs. Active Chirp Outputs

  16. As illustrated in previous slide, passive Gammachirp output offers little advantage on clean speech using fixed stimulus levels We can incorporate parameter control via feedback Compute Passive GC Spectrogram Segment into frames For each time frame S1 S2 : SN-1 SN Get stimulus level/channel Filter w/ level specific filter Reconstruct Frames Incorporating level dependency

  17. Sample outputs 30dB SNR Clean 40dB SNR 20dB SNR

  18. References • Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for Matlab. Centre for the Neural Basis of Hearing. http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/ • Irino, T. and Patterson, R.D. (2001).“A compressive gammachirp auditory filter for both physiological and psychophysical data,” J. Acoust. Soc. Am. 109, 2008-2022. • Pickles, J.O. (1988). An Introduction to the Physiology of Hearing (Academic, London). • Slaney, M. (1993). “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35. • Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research Technical Report #1998-010. http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/

  19. Sidenote Clean 40 dB SNR 30 dB SNR

More Related