190 likes | 227 Vues
Multi-band Frequency Compression for Reducing the Effects of Spectral Masking in Sensorineural Hearing Loss P. C. Pandey & P. N. Kulkarni EE Dept, IIT Bombay 07 / Jan / 2012. IIT Bombay. pnkulkarni@ee.iitb.ac.in. Signal Processing A pproach
E N D
Multi-band Frequency Compression for Reducing the Effects of Spectral Masking in Sensorineural Hearing Loss P. C. Pandey & P. N. Kulkarni EE Dept, IIT Bombay 07 / Jan / 2012 IIT Bombay pnkulkarni@ee.iitb.ac.in
Signal Processing Approach • Speech signal is windowed and the spectrum is divided into a number of bands and spectral components in each band compressed towards the center, for reducing the effects of increased spectral masking • Objective of the Investigation • To select the most appropriate combination of processing parameters • To evaluate the improvement in speech perception by normal-hearing S’s with simulated hearing loss and hearing-impaired S’s
Earlier study Critical-band based multi-band frequency compression [Yasu et al., 2002, 2004] ▪ Magnitude spectrum compressed towards center of each critical band and associated with unaltered phase spectrum (segmentation with Hamming window, STFT, spectral modification, and overlap-add synthesis) ▪Moderateimprovement in the VCV recognition score for hearing-impaired subjects (unproc. 35.4%, proc. 38.3%). Proposed technique Multi-band compression applied on complex spectrum by compressing spectral samples towards band centers and overlap-add based analysis-synthesis
Input speech (fs = 10 kHz) Segmentation(50% overlap) Spectral analysis & modification (Zero padding & FFT, Compression of complex spectral samples in a set of predefined bands towards the center by a fixed CF ) Resynthesis(IFFT, overlap-add) Proc. speech
Parameters affecting quality & intelligibility • ▪ Bandwidth ▪Segmentation ▪ Frequency mapping ▪ Comp. factor • Bandwidth • ▪Constant bandwidth • (18 bands, BW:278 Hz) • ▪1/3 octave bandwidth • ▪Auditory critical • bandwidth (ACB) • Segmentation • Fixed-frame: 20 ms, 50 % overlap • Pitch-synchronous: Two local • pitch periods, overlap of • one local pitch period BW = ACB, Comp. factor = 0.6
Frequency mapping ▪Sample-to-sample ▪Superimposition of spectral samples ▪ Spectral segment Sample-to-sample: Irregular variation in spectrum and signal energy Superimposition of spectral samples: Irregular variation in spectrum and signal energy partly compensated
Spectral segment mapping Output spectral sample = weighted sum of complex spectral samples in the input frequency segment [a,b] corresponding to the output sample k'. m, n : first and last FFT indices in the segment [a,b]. No irregular variation in spectrum and signal energy
Processing example /aka/: (a) unpr. (b) proc. (fixed-frame seg., spectral segment mapping. ACB, CF = 0.6). Harmonic structure in voiced segments & randomness in unvoiced segments approximately preserved (a) (b)
Listening tests Exp. A: MOS test (for quality assessment) To find the effect of frequency mapping, bandwidth, and segmentation on the perceived quality of the compressed speech ▪ Test material: /aiu/, "we were away a year ago". ▪ Subjects: six normal-hearing with simulated loss (SNR: ∞, 6, 0, -3 dB SNR) ▪ Presentation: reference (unprocessed) - 0.5 s silence - test (processed) ▪ Subject response for each presentation (randomized) : rating on 0 5 scale, with 3 assigned to the reference.
Exp. B: MRT on normal-hearing S’s ▪ 6 normal-hearing S’s, with simulated loss (∞, 6, 3, 0, -3, -6, -9, -12, -15 dB SNR) ▪ No. of presentations: 10,800 (300 words × 4 comp. fact. × 9 SNR) ▪ One or two sessions of 40 min duration in a day, spread over one month period Exp. C: MRT on hearing-impaired S’s ▪ 8 S’s with moderate-to-severe sensorineural loss (without using their hearing aids) ▪ No. of presentations: 1,200 (300 words × 4 comp. factors) ▪ One or two sessions of 1 hr. duration in a day, spread over one month period
Results of MOS test Avg. difference in MOS (C.F. = 0.6, Test material: sentence) Mappings: point-to-point (M1), superimp. of spectral samples (M2), spectral-seg. mapping (M3). Bandwidth: ACB, 1/3-octave, CB18. Segmentation: fixed-frame (FF), pitch-synchronous (PS). Highest scores for pitch-synch. segmentation, ACB, spectral seg. mapping.
Results of MRT on normal-hearing S’s Avg. Recognition Scores Highest improvement for c = 0.6 ▪ 17 % increase in recognition score for SNR < -6 dB ▪ SNR advantage of 6 dB at about 60 % recognition score
Results of MRT on normal-hearing S’s Avg. Response Time Highest improvement for c = 0.6 0.43 – 0.88 (mean = 0.79 s) decrease in the response time for SNR, < 0 dB
MRT on hearing-impaired S’s Recog. scores
MRT on hearing-impaired S’s Resp. Time
Summary of Investigations • ▪ MOS test for quality assessment (CF = 0.6) • highest scores for • pitch-synch. segmentation • spectral seg. mapping • auditory critical bandwidth • ▪ MRT on normal-hearing & hearing-impaired S’s • maximum improvement in speech perception for CF = 0.6
Conclusion Multi-band frequency compression, using compression on complex spectrum with pitch-synchronous segmentation, auditory critical bandwidths, and spectral segment mapping, can be used as part of signal processing in hearing aids to be used by persons with moderate sensorineural hearing loss.
Further work • Further evaluation using different types of test material and a larger number of subjects with different types of loss characteristics. • Study of the effectiveness of the scheme in improving speech perception by normal-hearing and hearing-impaired listeners in the presence of different types of noises. • Evaluation after incorporating frequency-dependent gain and multi-band amplitude compression in accordance with the loss characteristics of the individual listener. • Real-time implementation for use in hearing aids.
Reference P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss, Speech Communication, vol. 54(3), pp. 341-350, March 2012. DOI: 10.1016/j.specom.2011.09.005. Abstract --In multi-band frequency compression, the speech spectrum is divided into a number of analysis bands, and the spectral samples in each band are compressed towards the band center by a constant compression factor, resulting in presentation of the speech energy in relatively narrow bands, for reducing the effect of increased intraspeech spectral masking associated with sensorineural hearing loss. Earlier investigation assessing the quality of the processed speech showed best results for auditory critical bandwidth based compression using spectral segment mapping and pitch-synchronous analysis-synthesis. The objective of the present investigation is to evaluate the effectiveness of the technique in improving speech perception by listeners with moderate to severe sensorineural loss and to optimize the technique with respect to the compression factor. The listening tests showed maximum improvement in speech perception for a compression factor of 0.6, with an improvement of 9%–21% in the recognition scores for consonants and a significant reduction in response times.