280 likes | 458 Vues
Speech Processing. Applications of Images and Signals in High Schools. AEGIS RET All-Hands Meeting University of Central Florida June 22, 2012. Contributors. Dr . Veton Këpuska , Faculty Mentor, FIT Jacob Zurasky , Graduate Student Mentor, FIT
E N D
Speech Processing Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting University of Central Florida June 22, 2012
Contributors Dr. VetonKëpuska, Faculty Mentor, FIT Jacob Zurasky, Graduate Student Mentor, FIT Becky Dowell, RET Teacher, BPS Titusville High
Motivation • Speech audio processing has increased in its usefulness. • Applications • Siri on iPhone 4S • Automated telephone systems • Voice transcription (e.g. dictation software) • Hands-free computing (e.g., OnStar) • Video games (e.g., XBOX Kinect) • Military applications (e.g., aircraft control) • Healthcare applications
Motivation • Speech recognition requires speech to first be characterized by a set of “features”. • Features are used to determine what words are spoken. • To understand how the features are computed is very important. • Our project will implement the feature extraction stage of a speech processing application.
Work Completed • MATLAB fundamentals • Introduction of Signal Processing and Filtering • Beginning Project Implementation
Speech Recognition Front End: Pre-processing Back End: Recognition Features Recognized speech Speech Large amount of data. Ex: 256 samples Reduced data size. Ex: 13 features • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. • 256 samples ------> 13 features • Back End - statistical models used to classify feature vectors as a certain sound in speech
Discrete Time Signals • Computer is a discrete system with finite memory resources, requires a discrete representation of sound • Sound represented as a sequence of samples • time vs. amplitude • Amplitude = volume
Discrete Time Signals • Sampling rate (# of samples per second) • 8 kHz - telephone • 44.1 KHz – CD audio • 96 kHz – DVD audio
Frequency Domain • Need to analyze signals over frequency rather than time. • Sound is composed of many frequencies at the same time • Frequency determines the pitch of the sound • To recognize the sound, we need to know the frequencies that make the sound.
Fast Fourier Transform (FFT) • Algorithm used to transform time domain to frequency domain. • MATLAB function: FFT(X,N) X – discrete time signal N – FFT size X – frequency spectrum K - frequency bin N – FFT size n - sample number x[n] – input signal
Sine Wave Example • MATLAB function sine_sound • Generate 3 sine waves and a composite signal • Play sound and plot graphs • Compute and plot FFT of composite signal
Sine Wave Example % plays a C major chord (C4, E4, F4) sine_sound(8000, 261.626, 329.628, 391.995, 1, 4096);
Front-End Processing of Speech Recognizer • Pre-emphasis • Window • FFT • Mel-Scale • log • IFFT
Pre-Emphasis • 1st order FIR filter • In human speech, higher frequencies have less energy. Need to compensate for higher frequency roll off in human speech • High Pass filter
Windowing • Separate speech signal into frames • Apply window to smooth edges of framed of speech signal
Mel-Scale • Model sound as humans perceive it – logarithmically. • At high frequencies, a larger change in frequency is required to notice a difference • Convert linear scale (Hz) to logarithmic scale (mel-scale)
Connections to High School Mathematics Curriculum • Florida Math Standard (NGSSS) MA.912.T.1.8: • Solve real world problems involving applications of trigonometric functions using graphing technology when appropriate. • Pre-Calculus course • related topics include graphs of trigonometric functions, unit circle, logarithmic scale, complex numbers in trig form
Timeline • Week 1 • MATLAB fundamentals • MATLAB Filter Design & Analysis Tool • Introduction to Signal Processing, FFT, Filtering • Identified topics connected to high school math curriculum • Week 2 • Continued tutorials on signal processing and filtering • Implementation of sample code for use in lesson plans • Implementation of Pre-emphasis, Windowing, FFT, Cepstral Transform
Timeline • Week 3 – 6 • Implementation of Front-End Speech Processing • Work on deliverables.
References • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007.
Thank you! Questions?