Signal Reconstruction from its Spectrogram

Signal Reconstruction from its Spectrogram Radu Balan IMAHA 2010, Northern Illinois University, April 24, 2010

Overview • Problem formulation • Reconstruction from absolute value of frame coefficients • Our approach • Embedding into the Hilbert-Schmidt space • Discrete Gabor multipliers • Quadratic reconstruction • Numerical example

1. Problem formulation • Typical signal processing “pipeline”: Processing In Analysis Synthesis Out • Features: • Relative low complexity O(Nlog(N)) • On-line version if possible

x c y <·,gi> Analysis Synthesis The Analysis/Synthesis Components: Example: Short-Time Fourier Transform

x(t+kb+M:t+kb+2M-1) x(t+kb:t+kb+M-1) * * g(t) = = x(t+kb)g(t) x(t+(k+1)b)g(t) fft fft f ck+1,F-1 ck,F-1 ck,0 ck+1,0 Data frame index (k)

ck+1,F-1 ck,F-1 ck+1,0 ck,0 ifft ifft * * ĝ(t) = = +

Reconstruction ck,f dk,f x |.| • Problem: • Given the Short-Time Fourier Amplitudes (STFA): • we want an efficient reconstruction algorithm: • Reduced computational complexity • On-line (“on-the-fly”) processing

Where is this problem important: • Speech enhancement • Speech separation • Old recording processing

2. Reconstruction from absolute value of frame coefficients • Setup: • H=En , where E=R or E=C • F={f1,f2,...,fm} a spanning set of m>n vectors • Consider the map: • Problem 1: When is N injective? • Problem 2: Assume N is injective, Given c=N(x) construct a vector y equivalent to x (that is, invert N up to a constant phase factor)

Theorem [R.B.,Casazza, Edidin, ACHA(2006)] For E = R : • if m  2n-1, and a generic frame set F, then N is injective; • if m2n-2 then for any set F, N cannot be injective; • N is injective iff for any subset JF either J or F\J spans Rn. • if any n-element subset of F is linearly independent, then N is injective; for m=2n-1 this is a necessary and sufficient condition.

Theorem [R.B.,Casazza, Edidin, ACHA(2006)] For E = C : • if m  4n-2, and a generic frame set F, then N is injective. • if m2n and a generic frame set F, then the set of points in Cn where N fails to be injective is thin (its complement has dense interior).

E=span{Kgk,f} Signal space: l2(Z) x Kx K Kgk,f nonlinear embedding Hilbert-Schmidt: HS(l2(Z)) 3. Our approach Recall: • First observation: Hilbert-Schmidt

Frame operator • Assume {Kgk,f} form a frame for its span, E. Then the projection PEcan be written as: where {Qk,f} is the canonical dual of {Kgk,f}.

Second observation: since: it follows:

However: • Explicitely:

Theorem [F’00] Assume {g , Lattice} is a frame for L2(R). • Then the following are equivalent: • {<.,g>g,Lattice} is a frame for its span, in HS(L2(R)); • {<.,g>g,Lattice} is a Riesz basis for its span, in HS(L2(R)); • The function H does not vanish, Short digression: Gabor Multipliers • Goes back to Weyl, Klauder, Daubechies • More recently: Feichtinger (2000), Benedetto-Pfander (2006), Dörfler-Toressani (2008)

Return to our setting. Let TheoremAssume {gk,f}(k,f)ZxZF is a frame for l2(Z). Then • is a frame for its span in HS(l2(Z)) iff for each mZF, H(,m) either vanishes identically in , or it is never zero; • is a Riesz basis for its span in HS(l2(Z)) iff for each mZF and , H(,m) is never zero.

Third observation. Under the following settings: • For translation step b=1; • For window support supp(g)={0,1,2,...,L-1} • For F2L • The span of is the set of 2L-1 diagonal band matrices. 

The reproducing condition (i.e. of the projection onto E) implies that Q must satisfy: By working out this condition we obtain:

The fourth observation: We are able now to reconstruct up to L-1 diagonals of Kx. This means we can estimate Assuming we already estimated xs for s<t, we estimate xt by a minimization problem: for some JL-1 and weights w1,...,wJ. Remark: This algorithm is similar to Nawab, Quatieri, Lim [’83] IEEE paper.

Stage 1 Stage 2 |ck,0|2 I F F T Least Square Solver W0    |ck,F-1|2 WL-1 Reconstruction Scheme • Putting all blocks together we get:

3. Numerical Example

Conclusions All is well but ... • For nice analysis windows (Hamming, Hanning, gaussian) the set {Kgk,f} DOES NOT form a frame for its span! The lower frame bound is 0. This is the (main) reason for the observed numerical instability! • Solution: Regularization.

Signal Reconstruction from its Spectrogram