Applications of Speech Processing: Insights from AEGIS RET Meeting at Florida Institute of Technology

Speech Processing Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting Florida Institute of Technology July 6, 2012

Contributors Dr. VetonKëpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

Timeline • 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided • 1952: Bell Labs develops first effective speech recognizer • 1971-1976 DARPA: speech should be understood, not just recognized • 1980’s: Call center and text-to-speech products commercially available • 1990’s: PC processing power allow of SR software by ordinary user Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

Motivation • Applications • Call center speech recognition • Speech-to-text applications (e.g. dictation software) • Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri) • Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odysseyhttp://www.youtube.com/watch?v=6MMmYyIZlC4 • Science Fact 2011: Apple iPhone 4S Sirihttp://www.apple.com/iphone/features/siri.html

Difficulties • Continuous Speech (word boundaries) • Noise • Background • Other speakers • Differences in speakers • Dialects/Accents • Male/female

Motivation • Speech recognition requires speech to first be characterized by a set of “features”. • Features are used to determine what words are spoken. • Our project implements the feature extraction stage of a speech processing application.

Speech Recognition Front End: Pre-processing Back End: Recognition Features Recognized speech Speech Large amount of data. Ex: 256 samples Reduced data size. Ex: 13 features • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. • 256 samples ------> 13 features • Back End - statistical models used to classify feature vectors as a certain sound in speech

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • IFFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

Speech Analysis and Sound Effects (SASE) Project • Graphical User Interface (GUI) • Speech input • Record and save audio • Sound file (*.wav, *.ulaw, *.au) • Graphs the entire audio signal • Select a “frame” by clicking on graph • Process speech frame and display output for each stage of processing • Displays spectrogram

GUI Components

GUI Components Plotting Axes

Buttons GUI Components Plotting Axes

MATLAB Code • Graphical User Interface (GUI) • GUIDE (GUI Development Environment) • Callback functions • Work in progress • Extendable • Stages of speech processing • Modular functions for reusability

SASE Lab • Interactive teaching tool • Demo

Future Work • Improve GUI • Audio Effects • Ex: Echo, Reverberation, Chorus, Flange • Noise Filtering

References • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. • Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

Thank you! Questions?

Unit Plan • Introduction • Lesson #1: The Sound of a Sine Wave • Lesson #2: Frequency Analysis • Lesson #3: Filtering (work in progress) • Lesson #4: SASE Lab (work in progress) • Conclusion

Applications of Speech Processing: Insights from AEGIS RET Meeting at Florida Institute of Technology

Applications of Speech Processing: Insights from AEGIS RET Meeting at Florida Institute of Technology

Presentation Transcript

74.406 Natural Language Processing - Speech Processing -

Speech Processing

Speech Processing

Speech Processing Text to Speech Synthesis

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech and Language Processing

Speech Processing