170 likes | 279 Vues
This study explores a cohort-selective frequency compression method in hearing aids aimed at improving the perception of speech among individuals with sensorineural hearing loss. Traditional hearing aids often fall short for those with severe losses. By utilizing an adaptive threshold control system that classifies speakers by gender and phonological units, the frequency compression can be tailored to individual needs. Clinical trials demonstrated significant improvements in speech comprehension, especially with female talkers, indicating effectiveness for varied listening environments. Future applications could extend to broader auditory enhancements.
E N D
Towards a Cohort-Selective Frequency-Compression Hearing Aid Marie Roch¤, Richard R. Hurtig¥, Jing Lui¤, and Tong Huang¤ ¤ ¥
Sensorineural Hearing loss • Most common type of hearing loss • Affects > 20 million in the US alone • Caused by physiological problems in the cochlea
Traditional Hearing Aids • Amplification of frequency bands • Amplitude compression • Works best in situations with high SNR
Problems With Traditional Methods • Simple amplification insufficient • Individuals with severe hearing loss cannot perceive formants “Where were you while we were away” Harrington and Cassidy 1999, p. 110
Preserving the formants • Frequency domain compression [Turner & Hurtig 1999] permits preservation of formants
Effectiveness • Clinical study of 15 hearing-impaired listeners showed improvement when listening to different groups • female talkers: 45% improvement • male talkers: 20% improvement Female Talker- Uncompressed Female Talker- Compressed
Challenges • Not all voices require the same level of compression • Single setting leads to inappropriate levels of compression
Adaptive thresholds • Decision-based control mechanism • Establish cohorts and compress according to cohort class. • Some possible cohorts: • Phonological units • Pitch • Speaker “gender”
Gender-based classifier • Selected “gender” for first study. • Female, Male, Child • Classifier output more stable than with phonological approaches. • Broad support in the literature for the ability of both humans and machines to do this.
Classifier • Gaussian mixture models • Features extracted from 25 ms windows shifted every 10 ms • Energy • 12 Mel-filtered cepstral coefficients (MFCC) • Time-derivatives of Energy & MFCC
Conversational telephone speech Band-limited 8 kHz Mu-law encoded Endpointed with the NIST/Kubala endpointer Train Single sides of same-gender phone calls 25 male & female Test 87 annotated cross-gender phone calls About 7 hours of calls (~5 min. each) LDC SPIDRE Corpus
Many errors occurred in fricatives which have high frequency energy Error analysis telephone bandwidth
Evalution on TIMIT • 630 speakers, clean speech 16 kHz corpus • Train: 25 male, 25 female. Test 413 male, 167 female. SPIDRE TIMIT
Median Smoothing (SPIDRE) median smoothed
Conclusions & Future Work • Classifier-based control systems • feasible • can be applied to other signal enhancement algorithms • need not be limited to the cohorts presented today (e.g. auditory scene analysis)