1 / 19

Automatic Detection-based Phone Recognition on TIMIT

Automatic Detection-based Phone Recognition on TIMIT. Based on Chen and Wang in ISCSLP’08 and Interspeech’09. Hung-Shin Lee ( 李鴻欣 ). 12 July, 2011 @ IIS, Academia Sinica. Detection-Based ASR. Human SR. Knowledge Detection. Integration. Knowledge (Higher Level). DB ASR.

hermione
Télécharger la présentation

Automatic Detection-based Phone Recognition on TIMIT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Detection-based Phone Recognition on TIMIT Based on Chen and Wang in ISCSLP’08 and Interspeech’09 Hung-Shin Lee (李鴻欣) 12 July, 2011 @ IIS, Academia Sinica

  2. Detection-Based ASR Human SR Knowledge Detection Integration Knowledge (Higher Level) DB ASR Detectors Integrator Results • Phone • Syllable • Word • Sentence • Semantic info • … • HMM • CRF • … • Phonological attr. • Prosodic attr. • Acoustic attr. • …

  3. Phonological Systems

  4. Phonological Feature Detection (1) 9 frames 0 1 0 1 . . . 0 1 MLP (Detectors) 13 MFCCs SPE_14 posterior probability hiddenlayer input layer i-4 i i+4 quantization 0 1 1 . . 0 1 GP_11 time-delay recurrent

  5. Phonological Feature Detection (2) 9 frames 6 MV Features 13 MFCCs 0 1 0 0 MLP (Centrality) 0 1 0 0 1 0 0 . . . . . . . . . 0 1 0 i-4 i i+4 MLP (Front-Back) 1 0 0 MV_29 time-delay 0 1 0 MLP (Roundness)

  6. Conditional Random Field (CRF) Integrator • General Chain CRF λj, μk : feature function weight parameters state feature function transition feature function yi-1 yi Output (phone) Y . . . . . . . . . Input (phonological features) X xi-1 xi xi+1

  7. CRF Integrator – Training Issues • Required Label for CRF Training • Phone: y • Phonological features: x Oracle-data trained CRF Phonological features OT CRF Mapping phones → phonological features Phone labels Training Data Phone labels Phonological features (with errors) Speech DT CRF Detectors MLP Detected-data trained CRF

  8. Experiments • Corpus: TIMIT • No SA1, SA2 • Training set (3296 utts), Dev set (400 utts) • Test set (1344 utts) • Phone set: TIMIT61 • Evaluation: CMU/MIT 39 • Baseline • CI-HMM • Toolkits • Nico Toolkit (for MLP), CRF++ (for CRF)

  9. Results (1) Model: OT CRF Test: OD Features Model: OT/DT CRF Test: DD Features

  10. Results (2) System Fusion

  11. System Fusion with CRF yi-1 yi Combined Results (Phone) Y . . . . . . . . . SPE Sys. MV Sys. Phone Sequence X GP Sys. HMM Sys. xi-1 xi xi+1

  12. Two Types of AFDTImperfection Phone h# n eh ow kcl k w eh ae eh s tcl t ix n AF(A) AF(A’) AF asynchrony AFDT errors

  13. Phone AFs CRF Training (1) Phone y Detected Errors t Phone y t AFDT Mapping Table AFs x Oracle Data Training AFs x Detected Data Training

  14. CRF Training (2) AF Sequence Phone y t AFDT AFs x Aligned Data Training

  15. Results (3) 27.97 % acc. drops on the introduction of AF asynchrony Detection Error causes further 7.99 % acc. drop

  16. 72Dim Windows + DCTs MLP Right Context 72Dim 144Dim MLP Left Context Windows + DCTs MLP 23 dim Mel 72Dim 310ms AF Asynchrony Compensation • AF asynchrony is caused by context variation • We can reduce AF asynchrony by letting our systems learn context variation directly – Long-Term information

  17. Results (4)

  18. Conclusions • A well-designed phonological feature system is important • AF asynchrony minimization training and AF-phone synchronization could also be investigated • Oracle Trained CRF is able to retrieve more phonological information from speech • High phone correction rate (but sensitive to detection error) • Helpful for combination • Detection-Based ASR is promising • A front-end detector is a major issue

  19. t t t t t AF and Phone Alignment Using AFDT phone sequence AF sequence

More Related