120 likes | 227 Vues
This course provides an in-depth exploration of how computers interact with speech through computational approaches that enhance our understanding of language and spoken communication. Key topics include prosody modeling, speech recognition, and synthesis, with practical applications such as voice dialing, transcription, and interactive voice response systems. The course is project-driven, focusing on building a spoken dialog system using speech recognition and synthesis techniques. Students will engage in hands-on activities and address broader scientific questions regarding language perception and production.
E N D
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg
Spoken Language Processing • How do computers interact with speech? • Computational approaches help us understand language and spoken communication. Symbolic and Direct Modeling of Prosody
Speech Recognition • Voice Dialing • Voice mail transcription • Closed Captioning • Interactive Voice Response / Spoken Dialog Systems • Keyword Spotting • Continuous Speech Recognition • Domain Specific vs. Open Domain Symbolic and Direct Modeling of Prosody
Speech Synthesis • Navigation Systems • Garmin • Google Maps • IBM Watson • Bank by phone • Spoken Dialog Systems • Screen Readers Symbolic and Direct Modeling of Prosody
How much information is in speech? • Words (Lexical Content) • Syntax • Semantics • Pragmatics • Speaker Identity • Gender, Personality • Speaker State • Discourse Acts Symbolic and Direct Modeling of Prosody
Other applications • Video retrieval • “Rich Transcription” • Speech Segmentation • Emotion Analysis • Speech-to-speech translation • Intelligence Applications • Deception • Trust • Language & dialect Identification Symbolic and Direct Modeling of Prosody
Broader Scientific Questions • How do you produce sounds that other perceive as language? • How does a hearer decode what you are trying to express? • How and why do you use prosodic variation? • Phrasing • Emphasis • Intonational Contours • Emotion, sarcasm, etc. • How does smooth turn taking happen? Symbolic and Direct Modeling of Prosody
What will be covered in this course. • Project driven course. • Spoken Dialog System • Recognize Speech • CMU Sphinx • Make a decision • Generate Speech • Festival Symbolic and Direct Modeling of Prosody
How the course is structured • Speech Recognition • Speech Synthesis • Analysis of additional information from speech • Speaker ID • Prosody/Intonation • etc. Symbolic and Direct Modeling of Prosody
Project and Exams • Build a Spoken Dialog System • 4 Deadlines • Project Description (and team membership) • Speech Recognition Component • Speech Synthesis Component • Full system with demo. (Start on this early) • Project writeup. • In class midterm. Symbolic and Direct Modeling of Prosody
The syllabus and course policies • course webpage: • http://eniac.cs.qc.cuny.edu/andrew/slp/syllabus.html Symbolic and Direct Modeling of Prosody
Questions? • Policies • Course mechanics • Expectations Symbolic and Direct Modeling of Prosody