Advancements in Speech Processing: Applications and Techniques in High School Education

Speech Processing Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012

Contributors Dr. VetonKëpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

Speech Processing Project • Speech recognition requires speech to first be characterized by a set of “features” • Features are used to determine what words are spoken. • Our project implements the feature extraction stage of a speech processing application.

Applications • Call center speech recognition • Speech-to-text applications • Dictation software • Visual voice mail • Hands-free user-interface • Sirihttp://www.apple.com/iphone/features/siri.html • OnStar • XBOX Kinect • Medical Applications • Parkinson’s Voice Initiative

Difficulties • Differences in speakers • Dialects/Accents • Male/female • Continuous Speech (word boundaries) • Noise • Background • Other speakers

Speech Recognition Front End: Pre-processing Back End: Recognition Features Recognized speech Speech Large amount of data. Ex: 256 samples Reduced data size. Ex: 13 features • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. • 256 samples ------> 13 features • Back End - statistical models used to classify feature vectors as a certain sound in speech

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • IFFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

Speech Analysis and Sound Effects (SASE) Project • Implements front-end pre-processing (feature extraction) • Graphical User Interface (GUI) • Speech input • Record and save audio • Read sound file (*.wav, *.ulaw, *.au) • Graphs the entire audio signal • Processes user selected speech frame and displays graphs of output for each stage • Displays spectrogram on entire signal and user selected 3-second segment • Modifies speech with user-configurable audio effects

MATLAB Code • Graphical User Interface (GUI) • GUIDE (GUI Development Environment) • Callback functions • Front-end speech processing • Modular functions for reusability • Graphs of output for each stage • Sound Effects • Echo, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer

Buttons GUI Components Plotting Axes

SASE Lab Demo • Record, play, save audio to file, open existing audio files • Select and process speech frame, display graphs of stages of front-end processing • Display spectrogram for entire speech signal or user selectable 3 second sample • Play speech – all or selected 3 sec sample • Apply sound effects, show user configurable parameters • Graphs spectrogram and speech processing on sound effects

SASE Lab

Applications of Signal Processing in High Schools • Convey the relevance and importance of math to high school students • Bring knowledge of technological innovation and academic research into high school classrooms • Provide opportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applicationsin the field of Signal Processing • Encourage students to pursue higher education and careers in STEM fields

Unit Plan: Speech Processing • Collection of lesson plans introduce high school students to fundamentals of speech and sound processing • Cohesive unit of four lessons • The Sound of a Sine Wave • Frequency Analysis • Sound Effects • SASE Lab • Hand-on lessons • Teacher notes • MATLAB projects

Unit Plan: Speech Processing • Connections to Pre-Calculus Course • Mathematical Modeling • Trigonometric Functions • Complex Numbers in Rectangular and Polar Form • Function Operations • Logarithmic Functions • Sequences and Series • NGSSS and Common Core Mathematics Standards

Unit Introduction • Students research, explore, and discuss current applications of speech and audio processing

Lesson 1: The Sound of a Sine Wave • Modeling sound as a sinusoidal function • Continuous vs. Discrete Functions • Frequency of Sine Wave • Composite signals • Connections to real-world applications: • Synthesis of digital speech and music

Lesson 1: The Sound of a Sine Wave • Student MATLAB Project • Create discrete sine waves with given frequencies • Create composite signal of the sine waves • Plot graphs and play sounds of the sine waves • Analyze the effect of frequency and amplitude on the graphs and the sounds of the sine functions

Lesson 1: The Sound of a Sine Wave % plays C4, C5, C6 - frequencies double between octave % sine_sound_sample(8000, 261.626, 523.251, 1046.500, 1);

Lesson 1: The Sound of a Sine Wave • Project Extension – Music Notes % twinkle twinkle little star % music = 'C4Q C4Q G4Q G4Q A4Q A4Q G4H '; % super mario bros % music = 'FS4+EN5,Q E4,Q E4,Q RR,Q E4,Q RR,Q C4,Q E4,Q RR,Q G4,Q';

Lesson 1: The Sound of a Sine Wave • Project Extension – Vowel Sounds • Vowel sounds characterized by lower three formants • aa “Bob” aa_m = struct('F1', 750, 'F2', 1150, 'F3', 2400, 'Duration', 215, 'W1', 1, 'W2', 1, 'W3', 1); • iy “Beat” iy_m = struct('F1', 340, 'F2', 2250, 'F3', 3000, 'Duration', 196, 'W1', 1, 'W2', 30, 'W3', 30);

Lesson 2: Frequency Analysis • Use of Fourier Transformation to transform functions from time domain to frequency domain • Modeling harmonic signals as a series of sinusoids • Sine wave decomposition • Fourier Transform • Euler’s Formula • Frequency spectrum • Connections to real-world applications: • Speech processing and recognition

Lesson 2: Frequency Analysis • Student MATLAB Project • Create a composite signal with the sum of harmonic sine waves • Plot graphs and play sounds of the sine waves • Compute the FFT of the composite signal • Plot and analyze the frequency spectrum

Lesson 2: Frequency Analysis % create five harmonic signals with fundamental frequency 262 % square_wave(8000, 262, 1, 1024);

Lesson 3: Sound Effects • Time-delay based sound effects • Discrete functions • Time-delay functions • Function operations • Connections to real-world applications: • Digital music effects and speech sound effects

Lesson 3: Sound Effects • Student MATLAB Project • Read a *.wav file • Use a delay function to modify the signal with an echo sound effect • Plot graphs and play sounds of the signals • Analyze the effect of changing parameters on the graphs and the sounds of the functions

Lesson 3: Sound Effects • Delay time of echo depends on distance to reflection surface • Volume of echo depends the reflection surface • Reflection coefficient α

Lesson 3: Sound Effects • Block diagram of echo effect • Output signal = input signal + reflection coefficient * delayed version of input signal • y[n] = x[n] + α*x[n-D]

Lesson 3: Sound Effects % echo at 50 m with reflection coefficient = 0.5 % echo_effect('becky.wav', 50, 0.5);

Lesson 4: SASE Lab • Guided inquiry of SASE Lab program • Experiment with different sound inputs • Analyze spectrogram • Make connections to previous lessons

Unit Conclusion • Students summarize and reflect on lessons in a presentation and report/poster

References • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007.

Questions? Thank you! • AEGIS website: http://research2.fit.edu/aegis-ret/ • Contacts: • Becky Dowell, dowell.jeanie@brevardschools.org • Dr. VetonKëpuska, vkepuska@fit.edu • Jacob Zurasky, jzuraksy@my.fit.edu

Advancements in Speech Processing: Applications and Techniques in High School Education

Advancements in Speech Processing: Applications and Techniques in High School Education

Presentation Transcript

74.406 Natural Language Processing - Speech Processing -

Speech Processing

Speech Processing

Speech Processing Text to Speech Synthesis

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing