Speech Processing

Speech Processing Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012

Contributors Dr. VetonKëpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

Speech Processing Project • Speech recognition requires speech to first be characterized by a set of “features” • Features are used to determine what words are spoken. • Our project implements the feature extraction stage of a speech processing application.

Timeline • 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided • 1952: Bell Labs develops first effective speech recognizer • 1971-1976 DARPA: speech should be understood, not just recognized • 1980’s: Call center and text-to-speech products commercially available • 1990’s: PC processing power allows use of SR software by ordinary user Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

Applications • Call center speech recognition • Speech-to-text applications • Dictation software • Visual voice mail • Hands-free user-interface • Sirihttp://www.apple.com/iphone/features/siri.html • OnStar • XBOX Kinect • Medical Applications • Parkinson’s Voice Initiative • Detection of sleep disorders

Difficulties • Continuous Speech (word boundaries) • Noise • Background • Other speakers • Differences in speakers • Dialects/Accents • Male/female

Speech Recognition Front End: Pre-processing Back End: Recognition Features Recognized speech Speech Large amount of data. Ex: 256 samples Reduced data size. Ex: 13 features • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. • 256 samples ------> 13 features • Back End - statistical models used to classify feature vectors as a certain sound in speech

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • IFFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

Speech Analysis and Sound Effects (SASE) Project • Implements front-end pre-processing (feature extraction) • Graphical User Interface (GUI) • Speech input • Record and save audio • Read sound file (*.wav, *.ulaw, *.au) • Graphs the entire audio signal • Processes user selected speech frame and displays graphs of output for each stage • Displays spectrogram on entire signal and user selected 3-second sample • Modifies speech with user-configurable audio effects

MATLAB Code • Graphical User Interface (GUI) • GUIDE (GUI Development Environment) • Callback functions • Front-end speech processing • Modular functions for reusability • Graphs of output for each stage • Sound Effects • Echo, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer

Buttons GUI Components Plotting Axes

SASE Lab Demo • Record, play, save audio to file, open existing audio files • Select and process speech frame, display graphs of stages of front-end processing • Display spectrogram for entire speech signal or user selectable 3 second sample • Play speech – all or selected 3 sec sample • Show differences in certain sounds in spectrogram and the features ex: “a e i o u” so audience understands how these graphs tell us about the sounds • Apply sound effects, show user configurable parameters • Graphs spectrogram and speech processing on sound effects • Show echo effect in spectrogram • Use as teaching tool

Future Work on SASE Lab • Audio Effect - Pitch extraction • Noise Filtering

Applications of Signal Processing in High Schools • Convey the relevance and importance of math to high school students • Bring knowledge of technological innovation and academic research into high school classrooms • Provide opportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applicationsin the field of Signal Processing • Encourage students to pursue higher education and careers in STEM fields

Unit Plan: Speech Processing • Collection of lesson plans introduce high school students to fundamentals of speech and sound processing • Connections to Pre-Calculus Course, NGSSS and Common Core Mathematics Standards • Mathematical Modeling • Trigonometric Functions • Complex Numbers in Rectangular and Polar Form • Function Operations • Logarithmic Functions • Sequences and Series • Matrices

Unit Plan: Speech Processing • Cohesive unit of four lessons • The Sound of a Sine Wave • Frequency Analysis • Sound Effects • SASE Lab • Hand-on lessons • Teacher notes • MATLAB projects

Unit Introduction • Students research, explore, and discuss current applications of speech and audio processing

Lesson 1: The Sound of a Sine Wave • Modeling sound as a sinusoidal function • Concepts covered: • Continuous vs. Discrete Functions • Frequency of Sine Wave • Composite signals • Connections to real-world applications: • Synthesis of digital speech and music

Lesson 1: The Sound of a Sine Wave • Student MATLAB Project • Create discrete sine waves with given frequencies • Create composite signal of the sine waves • Plot graphs and play sounds of the sine waves • Analyze the effect of frequency and amplitude on the graphs and the sounds of the sine functions

Lesson 1: The Sound of a Sine Wave % plays C4, C5, C6 - frequencies double between octave % sine_sound_sample(8000, 261.626, 523.251, 1046.500, 1);

Lesson 1: The Sound of a Sine Wave • Project Extension – Music Notes % twinkle twinkle little star % music = 'C4Q C4Q G4Q G4Q A4Q A4Q G4H '; % super mario bros % music = 'FS4+EN5,Q E4,Q E4,Q RR,Q E4,Q RR,Q C4,Q E4,Q RR,Q G4,Q';

Lesson 1: The Sound of a Sine Wave • Project Extension – Vowel Sounds • Vowel sounds characterized by lower three formants • aa “Bob” aa_m = struct('F1', 750, 'F2', 1150, 'F3', 2400, 'Duration', 215, 'W1', 1, 'W2', 1, 'W3', 1); • iy “Beat” iy_m = struct('F1', 340, 'F2', 2250, 'F3', 3000, 'Duration', 196, 'W1', 1, 'W2', 30, 'W3', 30);

Lesson 2: Frequency Analysis • Use of Fourier Transformation to transform functions from time domain to frequency domain • Concepts covered: • Modeling harmonic signals as a series of sinusoids • Sine wave decomposition • Fourier Transform • Euler’s Formula • Frequency spectrum • Connections to real-world applications: • Speech processing and recognition

Lesson 2: Frequency Analysis • Student MATLAB Project • Create a composite signal with the sum of harmonic sine waves • Plot graphs and play sounds of the sine waves • Compute the FFT of the composite signal • Plot and analyze the frequency spectrum

Lesson 2: Frequency Analysis % create five harmonic signals with fundamental frequency 262 % square_wave(8000, 262, 1, 1024);

Lesson 3: Sound Effects • Time-delay based sound effects • Concepts covered: • Discrete functions • Time-delay functions • Function operations • Connections to real-world applications: • Digital music effects and speech sound effects

Lesson 3: Sound Effects • Student MATLAB Project • Read a *.wav file • Use a delay function to modify the signal with an echo sound effect • Plot graphs and play sounds of the signals • Analyze the effect of changing parameters on the graphs and the sounds of the functions

Lesson 3: Sound Effects % echo at 50 m with reflection coefficient = 0.5 % echo_effect('becky.wav', 50, 0.5);

Lesson 4: SASE Lab • Guided inquiry of SASE Lab program • Experiment with different sounds inputs • Analyze spectrogram • Make connections to previous lessons

Unit Conclusion • Students summarize and reflect on lessons in a presentation and report/poster

References • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. • Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

AEGIS Project • AEGIS website: http://research2.fit.edu/aegis-ret/ • Contacts: • Becky Dowell, dowell.jeanie@brevardschools.org • Dr. VetonKëpuska, vkepuska@fit.edu • Jacob Zurasky, jzuraksy@my.fit.edu

Thank you! Questions?

Speech Processing