1 / 50

So far: Historical overview of speech technology basic components/goals for systems

Next talk focuses on the nature of the signal: Acoustic waves in small spaces (sources) Acoustic waves in large spaces (rooms). So far: Historical overview of speech technology basic components/goals for systems Quick review of DSP fundamentals Quick overview of pattern recognition basics.

luigi
Télécharger la présentation

So far: Historical overview of speech technology basic components/goals for systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next talk focuses on the nature of the signal: • Acoustic waves in small spaces (sources) • Acoustic waves in large spaces (rooms) • So far: • Historical overview of speech technology basic components/goals for systems • Quick review of DSP fundamentals • Quick overview of pattern recognition basics

  2. Acoustic waves - a brief intro • A way to bridge from thinking about EE to thinking about acoustics: • Acoustic signals are like electrical ones, only much slower … • Pressure is like voltage • Volume velocity is like current(and impedance = Pressure/velocity) • For wave solutions, c is a lot smaller • To analyze, look at constrained models of common structures: strings and tubes

  3. x + dx + =

  4. is the wave equation for transverse vibration on a string • So 2y 2y = c2 x2 t2 Where c can be derived from the properties of the medium, and is the wave propagation speed

  5. Solutions dependent on boundary conditions • Assume form f(t - x/c) for positive x direction • Then f(t + x/c) for negative x direction • Sum is A f(t - x/c) + B f(t +x/c)

  6. Excitation Open end x 0 L Uniform tube, source on one end, open on the other

  7. Plane wave propagation for frequencies below ~4000 Hz c = f

  8. By looking at the solutions to this equation, we can show that c is the speed of sound

  9. 2 2 = .. t + +

  10. + - e jt - + + + + - Let u+(t - x/c) = A e j(t - x/c) and u-(t + x/c) = B e j(t + x/c) u(0,t) = ej t = A e j(t - 0/c) - B e j(t + 0/c)

  11. u(0,t) = ej t = A e j(t - 0/c) - B e j(t + 0/c) Problem: Find A and B to match boundary conditions Solve for A and B (eliminate t) • Now you can get equation 10.24 in text, for excitation U() ej t : p(L,t) = 0 = A e j(t - L/c) + B e j(t + L/c) (upcoming homework problem) u(x,t) = cos [(L-x)/c] U() ej t cos [(L)/c] Poles occur when: f = (2n + 1)c/4L  = (2n + 1)πc/2L

  12. c = 340 m/s L = 17cm 4L = .68 m f1

  13. First 3 modes of an acoustic tube open at one end

  14. Effect of losses in the tube • Upward shift in lower resonances • Poles no longer on unit circle - peak values in frequency response are finite

  15. Effect of nonuniformities in the tube • Impedance mismatches cause reflections • Can be modeled as a succession of smaller tubes • Resonances move around - hence the different formants for different speech sounds

  16. =

  17. =

  18. Acoustic reverberation • Reflection vs absorption at room surfaces • Effects tend to be more important than room modes for speech intelligibility • Also very important for musical clarity, tone

  19. (uniformly distributed and diffuse) = = 4 + + =

  20. Decay of intensity when source is shut off (W=0) = - =

  21. = = - = =

  22. = 4mV

  23. The phrase “two oh six” convolved with impulse response from .5 second RT60 room

  24. Initial time delay gap = t0

  25. Measuring room responses • Impulsive sounds • Correlation of mic input with random signal source (since R(x,y) = R(x,x) * h(t) ) • Chirp input • Also includes mic, speaker responses • No single room response (also not really linear)

  26. Effects of reverb • Increases loudness • “Early” loudness increase helps intelligibility • “Late” loudness increase hurts intelligibility • When noise is present, ill effects compounded • Even worse for machine algorithms

  27. Dealing with reverb • Microphone arrays - beamforming • Reducing effects by subtraction/filtering • Stereo mic transfer function • Using robust features (for ASR especially) • Statistical adaptation

  28. Artificial reverberation • Physical devices (springs, plate, etc.) • Simple electronic delay with feedback • FIR for early delays (think of “initial time delay gap” in concert halls), IIR for later decay • Explicit convolution with stored response

More Related