1 / 45

Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora. Javier Macías-Guarasa International Computer Science Institute Berkeley, CA - USA. Overview. Introduction Acoustic adaptation MR SI task MR SD task Accent identification MR SI task FAE task Conclusions

mireya
Télécharger la présentation

Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acoustic Adaptation and Accent Identification in the ICSIMR and FAECorpora Javier Macías-Guarasa International Computer Science Institute Berkeley, CA - USA

  2. Overview • Introduction • Acoustic adaptation • MR SI task • MR SD task • Accent identification • MR SI task • FAE task • Conclusions • Future work

  3. Introduction (I) • Work on improving WER for non-native speakers in the ICSI MR corpus • General details on the Meeting Recording corpus: • Number of speakers: 61 • Speech segmented:85:08:21 • Number of accents: 15 • ‘Workable’ accents: • American 53:12:35 15m+8f • German 11:37:01 10m+2f • Spanish 04:38:24 4m+1f • British 01:03:45 2m+0f just for reference

  4. Introduction (II) • Initial idea: • Pronunciation modeling for non native speakers • Acoustic adaptation techniques to be tested first: • SRI Decipher system capabilities: • MAP/MLLR/PhoneLoop • Analyze different strategies • Speaker dependent and independent tasks

  5. Introduction (III) • Accent identification: • Needed to effectively use accent-dependent models in a real-world system • Emphasis in ‘practical’ approaches using, again, SRI Decipher capabilities • MR task is a difficult acoustic environment: • Low number of speakers/speech material • Certain speakers dominance (more details?) • FAE task also approached

  6. Introduction (VI)Baseline WERs • Using SRI 2003 system, WER:

  7. Acoustic adaptation (I) • Initial studies with old partitioning shows that global task adaptation through MAP is the best approach: • Accent-dependent MAP adaptation also promising • Initial attempt to do full retraining using 16KHz speech (also 8KHz speech as reference): • Very bad results (more details?) • Worse than baseline!! • Too few speakers in the training set given the task partition (speaker independent)

  8. Acoustic Adaptation (II) Previous work • Interest in language learning tools (CALL) • Standard acoustic adaptation techniques • MAP/MLLR using L1 or L2 speech data • Model interpolation • Clustering • Sufficient for high proficiency speakers • Pronunciation modeling: • Little (if any) success reported

  9. Acoustic Adaptation (IV) Objectives • Strategies for SI task, ¿combined improvement?: • Task MAP adaptation (TaskMAP) • Accent dependent MAP (AccMAP) • TaskMAP followed by AccMAP/MLLR • Strategies for SD task: • Task MAP adaptation (TaskMAP) (includes speaker adaptation) • Per speaker MAP adaptation (SpkMAP)

  10. Full DB MAP(task adaptation) Global MAP models Am DB Am MAP models MAP/MLLR Ge DB Ge MAP models MAP/MLLR SWB models . . . Sp DB Sp MAP models MAP/MLLR ? OR Acoustic adaptation (V) • Strategies for Acoustic adaptation: • Adaptation weights tuned per accent (heldout) • Final phoneloop stage TaskMAP AccMAPTask+AccMAP

  11. Acoustic Adaptation (VII) MR Speaker Independent Task • SI adaptation summary, WER:

  12. Acoustic Adaptation (VIII) MR Speaker Independent Task • SRI 5xRT system: • Using new dictionary and interpolated LMs • Using best map adapted models for mel features • Still some bugs in the process (more details?)

  13. Acoustic Adaptation (X) MR Speaker Dependent Task • SD adaptation summary, WER:

  14. Accent Identification (I) • Background: • Techniques similar to Language Identification • GMM based: • Broad collection of features • GMM tokenizers • Broad phonetic classes + HMMs • LM/AM score comparison • Based on phonotactic characteristics: • PPRLM, PRLM • More complex than LID • Hard to compare rates: No previous work in MR/FAE

  15. Accent Identification (II) Objectives • Strategy: Use SRI Decipher characteristics • Practical approach: Reasonable run times • GMM classification module (for gender detection) • Evaluate standard features and normalizations • Hypothesis driven, phone recognition: • CD/CI models • Recognition using flat Phone LM or flat LM • View as a text classification problem • Phone LM driven: • PRLM/PRLM • Combination using NNs

  16. Accent Identification (III) MR data: MM classification approach • GMM results for MR corpus: • Unbalanced data  tested over and under sampling • Use different features & normalization: • No significant differences (except when using voicing features): • lack of data • ~Uniform channels

  17. Accent Identification (IV) MR data: GMM classification approach • GMM results for MR corpus : • As a function of utterance length task AM-GE-SP-BR

  18. Accent Identification (V) MR data: Hypothesis driven approach • Text classification view using MR data: • Input from phone recognition: • From free phone recognition (CD/CI models, full/flat PLM) • Rainbow: CMU tool for text classification • Naive bayes classification technique • N-grams (1..6) • No further restrictions (feature selection, stop list, etc.)

  19. Accent Identification (VI) MR data: Hypothesis driven approach • Text classification view using MR data: • Best results using CI models + flat PLM (bigrams & trigrams) • Chunk based classification rates (simulation):

  20. Accent Identification (VII) MR data: Hypothesis driven approach • Text classification view using MR data: • Utterance based classification rates (simulation): • Need longer sequences!!

  21. Accent Identification (VIII) MR data: Hypothesis driven approach • Text classification view using MR data: • Real partition classification rates: • Worse-than-chance rates if utterance based (pending to do length-dependent AI task)

  22. Accent Identification (IX) Phone LM approach • PRLM: Phone recognition & LM PLM accent 1 LM scoring Score comparison Score comparison . . . Speech Phonerecognizer Decision Score comparison LM scoring AM PLM accent N

  23. Accent Identification (X) Phone LM approach • PRLM: Phone recognition & LM • Tested different AMs for phonetic string generation: • Std forced • Std SWB • MAP adapted per accent • Best is Std SWB • Tested 1-6gram: • Best is trigram • But very poor results

  24. Accent Identification (XI) MR data: Phone LM approach • PRLM: Phone recognition & LM: • As a function of utterance length, task AM-GE-SP-BR: Very bad results

  25. Accent Identification (XII) Phone LM approach • PPRLM: Parallel Phone recognition & LM LM scoring Accent a Phonerecognizer Models A Avg accent a Avg accent a . . . Score comparison Avg accent a Score comparison LM scoring Accent z . . . . . . Speech Decision LM scoring Accent a Score comparison Avg accent a Avg accent z Phonerecognizer Models Z . . . Avg accent a LM scoring Accent z

  26. Accent Identification (XIII) FAE database • Experiments with the FAE database: • 4500 speakers: More acoustic context • 20 seconds per speaker • Proficiency is labeled • Strategy: • Apply standard techniques • Possibly: • Use FAE-generated models in MR data

  27. Accent Identification (XIV) FAE database: GMM classification • GMM: • Gender independent classification (16-2048) • FAE results in GE-SP task: • Norm better than CMN. CMN better than plain features • Pending to test GD models

  28. Accent Identification (XV) FAE database: GMM classification • GMM: • Combining FAE models with MR data: • Using frame_cepstrum + CMN (GMM 256) • Combination is possible, but more experiments are needed!!

  29. Accent Identification (XVI) FAE database: hypothesis driven • Text classification view: • FAE results: • Better than chance but, still, far from useful • Pending to test FAE models in MR data

  30. Accent Identification (XVII) FAE database: Phone LM approach • PRLM/PPRLM: • Pending • GMM better than text based classification. GE-SP task, for example: • GMM: 72.0% • Text-based: 58.9% • Results as a function of speech length to be evaluated

  31. Conclusions • Acoustic adaptation is important to face non-native accents: • MAP adaptation provided best results: • Task adaptation+accent adaptation • Work on tuning adaptation weights for SD & SI task (magnitude differences) • Low proficiency speakers need additional improvements • Non native speech recognition may not be solvable!

  32. Conclusions • Accent identification: • Proved to be more difficult than LID • Different techniques applied: • GMM techniques and text classification techniques showed promising results • Standard PRLM strategy didn’t work as expected (score normalization needed?) • PPRLM to be tested • Integration to be tested

  33. Future work • Finish current experimentation: • Accent identification: • Test features and normalizations in GMM and phone LM based • Test acoustic scores ratios • Test LM scores • Test NN based combination • NonNat speech characterization: • Errors phone/word • Model ‘usage’ distributions

  34. Future work • Pronunciation modeling: • Evaluation of pronunciation variants found in the SRI SWB dictionary for NonNat speech • Rule based: • Rules in German (from Silke Goronzy’s work) • Rules in Spanish • ‘Speaking mode’ probability estimation (accent + …) • Use of new databases (FAE, TED, Fisher)

  35. Future work • A note on work on pronunciation modeling in the MR task: • The MR corpus is not suitable for data-driven pronunciation modeling: • High error rates for non native speakers & limited number of them • Rule based methods are to be tested first • Initial work on evaluating current pronunciation alternatives is needed • I got relevant rules for initial testing in German and Spanish

  36. Thank you!! • To ICSI and the ICSI Speech Group, with special emphasis to: • Morgan • Andreas • Qifeng, Barry, Adam, Yang, Yan, Dave, Jeremy, … • Sven & all international visitors • The FrameNet people (Miriam, Michael & Co.) • Staff, specially Lila, María Eugenia and Diane

  37. MR Partitioning • Speaker independent (SI subtask)

  38. Full retraining • Initial attempt to do full retraining using 16KHz speech: • With old partitioning • Too few speakers in the training set given the task partition (speaker independent)

  39. Speaker dominance • Few speakers concentrate most speech material:

  40. Acoustic adaptation (IV) • SI TaskMAP adaptation, WER: • Optimal map weight ~proportional to size of accented speech subset • Bigger improvements in non native accents • Bigger improvements for bigger data size

  41. Acoustic adaptation (IV) • SI AccMAP adaptation, WER: • Similar trends than TaskMAP, but no further improvements, except german  benefits from task data!

  42. Acoustic adaptation (IV) • SI TaskMAP+AccMAP adaptation, WER: • Small improvements over TaskMAP • Also tested MLLR instead (taskMAP+AccMLLR), but no improvements

  43. Gender ID issues • Gender identification: • Per chunk gender ID: • Per utterance gender ID:

  44. Acoustic AdaptationSRI 5xRT System results British 002-mel 002-mel-expanded 005-plp 005-plp-rescored rover STD 91.5 89.3 78.2 80.6 78.7 Adapted 87.9 84.3 78.8 79.7 79.7 BestSimple 87.9 ------------------------------------------------------------------------------ Spanish 002-mel 002-mel-expanded 005-plp 005-plp-rescored rover STD 97.5 94.2 95.9 95.1 93.4 Adapted 88.0 85.1 88.4 88.2 86.6 BestSimple 93.2 ------------------------------------------------------------------------------ German 002-mel 002-mel-expanded 005-plp 005-plp-rescored rover STD 50.7 47.6 44.7 44.5 44.1 Adapted 45.8 45.8 44.4 44.5 44.9 BestSimple 42.3 ------------------------------------------------------------------------------ American 002-mel 002-mel-expanded 005-plp 005-plp-rescored rover STD 36.3 33.2 31.2 30.8 31.0 Adapted 33.6 34.3 33.3 33.1 33.6 BestSimple 30.4

More Related