500 likes | 596 Vues
Next talk focuses on the nature of the signal: Acoustic waves in small spaces (sources) Acoustic waves in large spaces (rooms). So far: Historical overview of speech technology basic components/goals for systems Quick review of DSP fundamentals Quick overview of pattern recognition basics.
E N D
Next talk focuses on the nature of the signal: • Acoustic waves in small spaces (sources) • Acoustic waves in large spaces (rooms) • So far: • Historical overview of speech technology basic components/goals for systems • Quick review of DSP fundamentals • Quick overview of pattern recognition basics
Acoustic waves - a brief intro • A way to bridge from thinking about EE to thinking about acoustics: • Acoustic signals are like electrical ones, only much slower … • Pressure is like voltage • Volume velocity is like current(and impedance = Pressure/velocity) • For wave solutions, c is a lot smaller • To analyze, look at constrained models of common structures: strings and tubes
x + dx + =
is the wave equation for transverse vibration on a string • So 2y 2y = c2 x2 t2 Where c can be derived from the properties of the medium, and is the wave propagation speed
Solutions dependent on boundary conditions • Assume form f(t - x/c) for positive x direction • Then f(t + x/c) for negative x direction • Sum is A f(t - x/c) + B f(t +x/c)
Excitation Open end x 0 L Uniform tube, source on one end, open on the other
Plane wave propagation for frequencies below ~4000 Hz c = f
By looking at the solutions to this equation, we can show that c is the speed of sound
2 2 = .. t + +
+ - e jt - + + + + - Let u+(t - x/c) = A e j(t - x/c) and u-(t + x/c) = B e j(t + x/c) u(0,t) = ej t = A e j(t - 0/c) - B e j(t + 0/c)
u(0,t) = ej t = A e j(t - 0/c) - B e j(t + 0/c) Problem: Find A and B to match boundary conditions Solve for A and B (eliminate t) • Now you can get equation 10.24 in text, for excitation U() ej t : p(L,t) = 0 = A e j(t - L/c) + B e j(t + L/c) (upcoming homework problem) u(x,t) = cos [(L-x)/c] U() ej t cos [(L)/c] Poles occur when: f = (2n + 1)c/4L = (2n + 1)πc/2L
Effect of losses in the tube • Upward shift in lower resonances • Poles no longer on unit circle - peak values in frequency response are finite
Effect of nonuniformities in the tube • Impedance mismatches cause reflections • Can be modeled as a succession of smaller tubes • Resonances move around - hence the different formants for different speech sounds
Acoustic reverberation • Reflection vs absorption at room surfaces • Effects tend to be more important than room modes for speech intelligibility • Also very important for musical clarity, tone
(uniformly distributed and diffuse) = = 4 + + =
Decay of intensity when source is shut off (W=0) = - =
= = - = =
= 4mV
The phrase “two oh six” convolved with impulse response from .5 second RT60 room
Measuring room responses • Impulsive sounds • Correlation of mic input with random signal source (since R(x,y) = R(x,x) * h(t) ) • Chirp input • Also includes mic, speaker responses • No single room response (also not really linear)
Effects of reverb • Increases loudness • “Early” loudness increase helps intelligibility • “Late” loudness increase hurts intelligibility • When noise is present, ill effects compounded • Even worse for machine algorithms
Dealing with reverb • Microphone arrays - beamforming • Reducing effects by subtraction/filtering • Stereo mic transfer function • Using robust features (for ASR especially) • Statistical adaptation
Artificial reverberation • Physical devices (springs, plate, etc.) • Simple electronic delay with feedback • FIR for early delays (think of “initial time delay gap” in concert halls), IIR for later decay • Explicit convolution with stored response