1 / 22

What I did on my Summer “Vacation“

What I did on my Summer “Vacation“. Jeremy Morris 10/06/2006. Summer at AFRL - DAGSI. AFRL Air Force Research Labs Wright-Patterson AFB, Dayton OH DAGSI Student/Faculty Resarch Fellowship program Dayton Area Graduate Studies Institute

fkelly
Télécharger la présentation

What I did on my Summer “Vacation“

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006

  2. Summer at AFRL - DAGSI • AFRL • Air Force Research Labs • Wright-Patterson AFB, Dayton OH • DAGSI Student/Faculty Resarch Fellowship program • Dayton Area Graduate Studies Institute • Effort to encourage collaboration between Ohio universities and AFRL

  3. Summer at AFRL – SCREAM Lab • SCREAM Lab • Speech and Communication Research, Engineering, Analysis and Modeling Lab • Interest in a wide variety of speech research issues for the military • Speech-to-speech translation, rapid development of speech recognition systems, etc.

  4. Summer at AFRL – Why us? • SCREAM Lab members were interested in collaborating with OSU • SCREAM Lab working on research in using phonological features in speech recognition • Perceived overlap with ASAT project

  5. Review – Phonological Features • For the ASAT Project, we have been using phonological feature detectors • We train detectors on a particular phonological feature • e.g. manner or place for consonant, height, frontness, etc. for vowels • We then combine these features together for ASR purposes

  6. Phonological Features (cont.) • SCREAM Lab very interested in phonological feature detectors • Need for quick development of new ASR systems for new languages • A full set of phonological feature detectors would allow reuse of acoustic data for training across new languages • Multi-lingual detectors are clearly needed to get full coverage of all features

  7. Phonological Features (cont.) • Our phonological feature detectors • Monolingual (English only) • Trained using a set of multi-layer perceptron neural networks • Output a set of phonological feature class probabilities • SCREAM lab feature detectors • Monolingual and multilingual • Trained using Gaussian Mixture Models • Output a set of likelihoods • Based on work by Tanja Schultz (CMU)

  8. Summer at AFRL - Proposal • Besides acoustic models, new ASR systems for new languages have other needs • An ASR system needs a lexicon mapping phones-to-words • Normally hand-constructed • Require time and expertise

  9. Summer at AFRL - Proposal • Our proposal: look at methods of bootstrapping new lexicons from: • Acoustic data • Word-level transcripts • Phonological feature detector outputs • How? • Start by looking at work on deriving Acoustic Sub-Word Units

  10. Summer at AFRM - Proposal • Acoustic Sub-Word Units (ASWUs) • Similar to phones in that they are smaller pieces of words • BUT – automatically derived from acoustics instead of manually defined • Used to derive both a sub-word unit set and a lexicon for that set simultaneously • Research in this area has been mainly to improve ASR performance

  11. Summer at AFRL - Proposal • Can we use these methods along with phonological features as inputs to induce new lexicons? • Using phonological features, the sub-word units may be mappable to standard IPA phone labels

  12. Summer at AFRL - Proposal • The proposed system is inspired by an ASWU by (Singh et al., 2002) • Notable for not requiring word boundaries to be marked for training • Start with a basic dictionary (including a starting phoneset size) • Train a set of acoustic models on the training data with that dictionary • Alter the basic dictionary in a manner that improves your pronunciations • Repeat until a stopping criterion is reached

  13. Summer at AFRL - Proposal • Start with a basic dictionary • Start with an assumption that the number of phones in a word is related to the number of letters in the orthography • Basic dictionary maps word to sequence of letters in that word: ABLE  A B L E BANNED  B A N N E D

  14. Summer at AFRL - Proposal • Train a set of acoustic models • Using the basic dictionary, map words in the transcript to these “pronunciations” • Train an HMM-model using the output of the feature detectors as its input, and the above mapping as training labels

  15. Summer at AFRL - Proposal • Alter the basic dictionary • Using some metric, find a candidate “phone” to be modified • We’ve looked at a couple of metrics – more on this later • Once the phone is identified, see if the phone should be “split” or “deleted” • A “split” indicates that the given phone label actually represents two different sounds, and so should be replaced with two different phone labels • A “delete” indicates that for a particular word or words the model fits better if that phone label is removed from the pronunciation

  16. Summer at AFRL - Proposal • Split example: BE  B E DEVELOP  D E1 V E1 L O P • Delete examples: ABLE  A B L E :: ABLE  A B L ABANDONED  A B A N D O N D

  17. Summer at AFRL - Proposal • For splits, all possible alterations are added to temporary lexicon • For deletes, we alter the HMM to add a possible deletion arc for the phone • After lexicon or HMM is altered, word transcript is force aligned using new possible pronunciations • Best pronunciations are pulled from this alignment and used to build new lexicon • Steps are repeated using the new lexicon in place of the basic lexicon

  18. Summer at AFRL - Proposal • How do we determine the candidate “phone label” to alter? • Initially, modelled each phone with two Gaussians in the HMM • Compared the two Gaussians to each other using their KL-divergences • Took the phone label with the largest KL divergence as the one to alter • Idea was that each Gaussian described a cluster – the further these centers were from each other, the more probable they were describing two different phones

  19. Summer at AFRL - Proposal • KL-divergence metric did not work well • System would pick candidates that a human would find unreasonable (such as “F” or “Q”) • System would split or delete these phones multiple times, continually returning to the same phone label

  20. Summer at AFRL - Proposal • Why did the KL divergence perform this way? • Suspcion: Large variations in the two Gaussians in areas that do not matter for that phone pushed up the scores (e.g. vowel features for consonants) • Splitting these phones only allowed the coverage to spread wider, drawing the system back to those phones

  21. Summer at AFRL - Proposal • What next? • Tried Mahalanobis distance metric, with poor results also • Returned to Acoustic Sub-Word papers for inspiration • Instead of looking at cluster stats, multiple papers use an average frame likelihood metric for each phone cluster to determine candidate phone for altering • Have started moving my code to use this framework – preliminary passes show promise, but no results quite yet

  22. Conclusion – It’s 75 miles to Dayton • Advice for those thinking of doing work at WPAFB • Working in the SCREAM Lab was great • Hundreds of processors, tons of multi-lingual corpora • Friendly people, decent work environment (if a bit dark) • Many hoops to jump through, even just for a summer student • ID badges, computer usage training, etc. • Sometimes feels like you’re working at a corporation… • until the guys in uniform come around • The base is built like a campus crossed with a prison • cinderblock is the building material of choice. • Don’t forget your ID Badge • It’s 75 miles from Columbus to Dayton

More Related