1 / 41

What Is Signal Processing?

What Is Signal Processing?. Signal Processing (n). (1) Conversion of a signal f(t,x,y,z) (as measured by sensors at x,y,z) to a form that’s easier to interpret or store. What Can I do with Signal Processing?.

kimberlymay
Télécharger la présentation

What Is Signal Processing?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Signal Processing? Signal Processing (n). (1) Conversion of a signal f(t,x,y,z) (as measured by sensors at x,y,z) to a form that’s easier to interpret or store.

  2. What Can I do with Signal Processing? Imaging (biomedical imaging, satellite remote sensing, distributed sensor networks, acoustic beamforming) Data Security (cryptography, watermarking) Communications (DSP implementation of modulators & error-correction codes at IF, RF, and even OF) Artificial Intelligence (computer vision, audiovisual scene understanding, speech recognition, cognitive world modeling) Data Mining (image recognition, audio recognition, audio & visual similarity measurement)

  3. Useful Signal Processing Classes Core DSP Classes: ECE 310 (DSP), 320 (DSP Lab), 313 (Probability), 359 (Communications I) Artificial Intelligence/ Pattern Recognition DSP: ECE 302 (Music Synthesis),ECE 348 (Artificial Intelligence) ECE 370 (Robotics), ECE 386 (Controls), ECE 389 (Robot Dynamics) Communications/ Digital Modulation: ECE 338 (Communication Networks), ECE 361 (Communications II)

  4. Where can I go with Signal Processing? Universities (Texas A&M, University of Maryland) DSP Companies (Motorola, TI) Communications Companies (Motorola, Rockwell, Boeing) Audio Companies (Shure, Nuance) Medical Imaging Companies (GE, Siemens)

  5. I. What Can Machines Do Now? - Telephone-based Dialog Systems - Dictation for Word Processing - Speech I/O for Disabled Users - Speech I/O for Busy Users (e.g. Radiologists) - Hands-free GPS in new Jaguar

  6. II. What Can People Do That Machines Can’t Do? - Two Voices at Once (TV is on --- why can’t I talk to my toaster?) - Reverberation (Do I need to put padding on all of the walls?) - Noise (automobile, street)

  7. II. Example 1: Two Voices at Once

  8. II. Example 1: Two Voices at Once

  9. II. Example 2: Reverberation - Recorded speech equals input(t-delay 1) + input(t-delay 2) - Delays are longer than a vowel, so two different vowels get mixed together - Result: Just like 2 different speakers!!

  10. II. Example 2: Reverberation The Only Way to Totally Avoid Reverberation:

  11. III. How do Machines Produce Speech?

  12. III. How do Machines Produce Speech?

  13. III. How do Machines Recognize Speech?

  14. III. Front End Processor

  15. Classification: Choose the “most probable” C C = argmax p(C|O) = argmax p(O|C) p(C) / p(O) = argmax p(O|C) p(C) p(C) --- the “language model” p(O|C) --- the “acoustic model” III. “Statistical” Speech Recognition

  16. IV. Hidden Markov Model

  17. IV. Hidden Markov Model Maximize p(O,Q) = p(i) p(o1|i) p(i|i) p(o2|i) p(i|i) p(o3|i) p(j|i) p(o4|j) ...

  18. IV. Semantic Parsing

  19. IV. Response Generation Database Response: 12 flights Priority Ranking of Information: 1. Destination City 2. Origin City 3. Date 4. Price ….. Response Generation: “There are 12 flights tomorrow morning from Champaign to San Francisco. What price range would you like to consider?”

  20. IV. How Do People Process Speech?

  21. IV. Step 1:a. CreateAcoustic Targetsb. ConvertAcoustic Goalsto Movement Goals--- just like arobot controlproblem

  22. IV. Step 2: Acoustic Resonator

  23. IV. Step 3: Mechanical Filtering

  24. IV. Step 4: Mechanical Pseudo-Fourier Transform

  25. IV. Step 5: Mechano-Electric Transduction

  26. IV. Step 6: Beam-Forming --- Correlate Signals from 2 Ears ---

  27. IV. Step 7: Understand & Respond

  28. Conclusions • Telephone Speech Technology in Limited Domains works well. • Speech Recognition doesn’t understand • Multiple Voices • Reverberation • Other kinds of variability, e.g. accents • Better Speech Technology can be produced by learning from Human Speech Procesing

  29. Background: Stop Cons. Release • Three “Places of Articulation:” • Lips (b,p) • Tongue Blade (d,t) • Tongue Body (g,k)

  30. Problem Statement: Content of Speech is Multivariate 1. Source Information: Prosody, Articulatory Features

  31. Content of Speech is Multivariate 2. Useful Non-Source Information: Composite Acoustic Cues

  32. Composite Cues: Traditional Solution

  33. Types of Measurement Error • Small Errors: Spectral Perturbation • Large Errors: Pick the Wrong Peak Amp. (dB) Frequency (Hertz)

  34. Large Errors are 20% of Total Std Dev of Small Errors = 45-72 Hz Std Dev of Large Errors = 218-1330 Hz P(Large Error) = 0.17-0.22 LogPDF Measurement Error (Hertz) re: Manual Transcriptions

  35. Measurement Error Predicts Classification Error

  36. Solution: Composite Cues as State Variables

  37. Description of the Test System

  38. Test System Results

  39. a PosterioriMeasurement Distributions:10ms After /d/ in “dark” DFT Amplitude DFT Convexity P(F | O, Q) Frequency (0-4000 Hertz)

  40. MRI Image Collection • GE Signa 1.5T • T1-weighted • 3mm slices • 24 cm FOV • 256 x 256 pixels • Coronal, Axial • 3 Subjects • 11 Vowels • Breath-hold in • vowel position • for 25 seconds

  41. MRI Image Segmentation • In CTMRedit: • Manual • Seeded Region • Growing • Tested: • Snake • Structural • Saliency

More Related