1 / 29

EE 516 Lecture 1

EE 516 Lecture 1. Geoffrey Zweig Microsoft Research 4/2/2009. Our Topics. Introducing today!. From JHU 2002 SuperSID Final Presentation – Reynolds et al. Topic Coverage By Day. Data Representations and Models (4/23) Vector Quantization Gaussian Mixtures The EM Algorithm

terrene
Télécharger la présentation

EE 516 Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

  2. Our Topics Introducing today! From JHU 2002 SuperSID Final Presentation – Reynolds et al.

  3. Topic Coverage By Day • Data Representations and Models (4/23) • Vector Quantization • Gaussian Mixtures • The EM Algorithm • Speaker Identification (5/7) • Language Identification (5/7) • Hidden Markov Models (5/14) • Dynamic Programming • Building a Speech Recognizer (5/14)

  4. Language Identification – Why Do it? • Multi-lingual society • Applications should be able to deal with anyone • Businesses • Automated help systems • Reservations, account access, etc. • Travel • Airport Kiosks • Train stations • Government • Funds research to identify languages • Runs evaluations in it

  5. How Do You Do it? English Acoustic Model French Acoustic Model Output Likeliest … Tamil Acoustic Model Gaussian Mixture Models - 4/23

  6. How Do You Do It? (2) “p ih n s” – probably English… “k r p s t” – probably Czech… Simple HMMs – 5/14 Language Models – 4/30 After Zissman 1996

  7. How Do You Do It (3) Same methods multiple times Acero et al., Chapter 4 4/23 After Zissman 1996

  8. How Do You Do It? (4) Run a complete speech recognizer in each language And we will see several other ways, and combinations! After Zissman 1996

  9. Gauging Progress – The NIST Evaluations • National Institute of Standards and Technology • Has sponsored benchmark tests in multiple language processing areas for over a decade • Topic Detection & Tracking • Content Extraction • Video Analysis • Speech Recognition • Language Identification • Speaker Identification • Machine Translation • http://www.itl.nist.gov/iad/mig/tests/ • Coordination with site funding by Defense Advanced Research Projects Agency (DARPA) • Along with business interest, the driving force in advancing the State-of-the-Art

  10. For Example, Progress in Speech Recognition

  11. Language Identification - How Well Can It Be Done – Who Salutes? From NIST 2007 LRE Website

  12. How Well Can it Be Done – What Languages? From NIST 2007 LRE Website

  13. How Well Can It Be Done? – Testing Conditions • 26 languages and dialects • Telephone speech • Multiple duration conditions • 3, 10, 30 seconds • Detection Error Tradeoff (DET) Curves used to measure performance

  14. How Well Can it Be Done – Some Numbers From NIST 2007 LRE Website

  15. Language Identification Project • Build a language ID system with the Call Friend Data set • Implement several of the main techniques • Set up a demo on your laptop that will recognize someone’s language

  16. Flavors of Speaker Recognition Our Focus! From JHU 2002 SuperSID Final Presentation – Reynolds et al.

  17. Speaker Recognition – Why Do It? • Personal Applications • Voice-print passwords • Voicemail transcription – who left that message? • Business Applications • Calling your bank • Government • Is that Osama calling from Pakistan? • Prison call monitoring • Automated parolee calling – is he where you think?

  18. How Do You Do It? The most basic approach: Gaussian Mixture Models - 4/23 More recently: Support vector machines operating on GMMs (!)

  19. How Do You Do It? (2) Also use high-level information! From JHU 2002 SuperSID Final Presentation – Reynolds et al.

  20. How Well Can It Be Done – Who Salutes? From NIST 2008 SRE Presentation, Martin & Greenberg

  21. More Salutes From NIST 2008 SRE Presentation, Martin & Greenberg

  22. From Europe From NIST 2008 SRE Presentation, Martin & Greenberg

  23. More From Europe From NIST 2008 SRE Presentation, Martin & Greenberg

  24. U.S. Entries From NIST 2008 SRE Presentation, Martin & Greenberg

  25. How Well Can It Be Done – Testing Conditions • Conditions for different amounts of data • 10 sec. • 3-5 minutes • 8 minutes • Separate channel and summed channel conditions • English-speakers, non-English speakers, multilingual speakers

  26. How Well Can It Be Done?

  27. Speaker Verification Project • Implement a Speaker-ID system • Template based • GMM based • SVM based • Vector space model • Demonstrate it: • NIST data, e.g. 2001 Evaluation • Your own voice – implement on laptop

  28. Speech Recognition Project • Implement an HMM based recognition system • Use, e.g., Phonebook isolated word data data set or Aurora digit set • Write features with existing front-end • Build your own HMM trainer/decoder • Set it up on your laptop for online word recognition (?!)

  29. Highlights of Syllabus • Required Texts: • Huang, Acero, Hon: Spoken Language Processing • Deng and O’Shaughnessy, Speech Processing • EE516 Reader, at Professional Copy ‘n Print, 4200 University Way • Grading: • Projects: 50% • Final Exam: 30% • Homework 20% • Projects: • Small team or individual • Teams are self-forming • Presentation times TBD • Read ahead & pick an area!!! • Talk to relevant instructor • Suggest deciding no later than 4/30 • Office Hours at end of class and by appointment • Please sign in on email list!

More Related