“Taking Signals to Bits” Sparse Representations for Coding, Sampling and Source Separation

“Taking Signals to Bits”Sparse Representations for Coding, Sampling and Source Separation Mike Davies University of Edinburgh with thanks to my family, students & colleagues

What are signals? A signal is a time (or space) varying quantity that can carry information. The concept is broad, and hard to define precisely. (Wikipedia)

… in medical physics … in telecommunications … in audio Signals are everywhere! … in image/video

Part I: Sparse Representations and Coding

Joseph Fourier What are signals made of? The Frequency viewpoint (Fourier): Signals can be built from the sum of harmonic functions (sine waves) signal Harmonic functions Fourier coefficients

Sampling and the digital revolution • Today we are more familiar with discrete signals (e.g. audio files, digital images). This is thanks to: • Whittaker–Kotelnikov–Shannon Sampling Theorem: • “Exact reconstruction of a continuous-time signal from discrete samples is possible if the signal is bandlimited and the sampling frequency is greater than twice the signal bandwidth.” Sampling below this rate introduces aliasing

frequency time Audio representations Which representation is best: time or frequency?

Frequency (Hz) a Gabor ‘atom’ Time (s) Audio representations Time and Frequency (Gabor) “Theory of Communication,” J. IEE (London) , 1946 “… a new method of analysing signals is presented in which time and frequency play symmetrical parts…”

Gabor and audio coding Time and Frequency (Gabor) “Theory of Communication,” J. IEE (London) , 1946 “…In Part 3, suggestions are discussed for compressed transmission and reproduction of speech or music…” Modern audio coders owe as much to Gabor’s notion of time-frequency analysis as it does to Shannon’s paper of a similar title, two years later, that heralded the birth of information and coding theory. “A Mathematical Theory of Communication,” Bell System Technical Journal, 1948. C. E. Shannon

Image representations … Space and Scale: the wavelet viewpoint: “Daubechies, Ten Lectures on Wavelets,” SIAM 1992 Images can be built of sums of wavelets. These are multi-resolution edge-like (image) functions.

Transform Sparsity What makes a good transform? “TOM” image Wavelet Domain Good representations are efficient - Sparse!

Compressed to 3 bits per pixel Compressed to 0.5 bits per pixel Compressed to 2 bits per pixel Compressed to 1 bits per pixel Compressed to 0.1 bits per pixel Compressed to 2 bits per pixel Tom’s nonzero wavelet coefficients Quantization in wavelet domain Quantization in pixel domain Coding signals of interest What is the difference between quantizing a signal/image in the transform domain rather than the signal domain?

For images (Olshausen and Field, Nature, 1996): For Audio (Abdallah & Plumbley, Proc. ICA 2001): Learning better representations Recent efforts have been targeted at learn better representations for a given set of signals, x(t): That is, learn dictionaries of functions that represent signals of interest with only a small number of significant coefficients, ck.

either good time resolution and poor frequency resolution or good frequency resolution and poor time resolution Build bigger dictionaries Another approach is to try to build bigger dictionaries to provide more flexible descriptions. Consider the following test signal: Heisenberg’s uncertainty principle implies that a Time-Frequency analysis has:

+ Combined representation Multi-resolution representations Heisenberg only applies to time-frequency analysis NOT time-frequency synthesis.Consider a TF synthesis representation with a combination of long (40 msec.) atoms and short (5 msec.) atoms. Finding the sparse coefficients is now a nonlinear (and potentially expensive) operation. good frequency representation good time representation New uncertainty principles for sparse representations

Part I Review • Sparse Representations and Coding • How we represent signals is very important; • Sparse representations provide good compression; • Recent efforts have targeted bigger and better representations; • Despite the linear representations nonlinear approximations play an important role.

Part II: Sparse Representations and Source Separation

Observed signals (indexed by i) Mixing process Source signals (indexed by j) Separating mixtures of signals Observed signal Multiple source signals Mathematical model: Aim: extract the sources, sj(t), from the observations, xi(t), often without knowledge of the mixing process (blind).

Blind source separation aims to extract the components associated with each activity. e.g. Observed signals typically contain maternal heartbeat, foetal heartbeat and interference from muscle activity, e.g. breathing. maternal lead 1 foetal (x2 mag.) lead 2 noise (x10 mag.) Time → Biomedical example Foetal ECG signal extraction Using sensors placed on a mother's abdomen, foetal electrical heart activity can be monitored.

Audio example The “Cocktail Party” Problem A microphone array can monitor the acoustic activity in a room. Aim: separate out the individual speakers. x1(t) s1(t) s2(t) x2(t)

x2 s2 x1 s1 Blind source separation Separating the sources without knowledge of the mixing process: Really there are two problems: • Identify the mixing process • This can be solved if the sources are statistically independent (Independent Component Analysis) and non-Gaussian • Separate the sources given the mixing process • Easy if the mixing process is invertible (just invert the mixing matrix). Difficult otherwise, e.g. when the number of source components > number of channels. ICA

Time-Freq. scatter plot Time domain scatter plot Sparse transform More sources than sensors Number of sources > Number of sensors Then the model is said to be overcomplete. e.g. Even if we could determine the mixing process, it is not invertible. We need additional help: 3 speakers 2 microphones sparsity again!

Sparsity and separability Sparse signals can be approximately separated by simple nonlinear operations, e.g. binary thresholding x “ideal” binary mask Black = 0 White = 1 source 1 Black = 1 White = 0 source 2

percussion piano saxophone Blind sparse source separation Given stereo data we can use spatial position of the source to identify the which time-frequency atoms belong to which sources. Here is an example with saxophone, piano and percussion. Left channel only Right channel only Histogram of relative amplitude of TF atoms

Sparse (nonlinear) audio separation Separating Instruments from an extract from John Coltrane. sax piano percussion

Part II Review • Sparse Representations and Source Separation • Representations are important in source Separation • Sparse representations not only provide good basis for compression but also good source separation; • Again nonlinear approximations play an important role.

Part III: Sparse Representations and Sampling (compressed sensing)

This is the philosophy of Compressed Sensing E. Candès, J. Romberg, and T. Tao, “Robust Ucertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Information Theory, 2006 D. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, 2006 Compressed sensing Traditionally when compressing a signal (e.g. speech or images) we take lots of samples (sampling theorem) move to a transform domain and then throw most of the coefficients away! Why can’t we just sample signals at the “Information Rate”?

Compressed sensing • The Compressed Sensing principle: • Take a small number of linear observations of a signal (number of observations << number of samples/pixels) • Use nonlinear reconstruction to estimate the signal via a transform domain in which the signal is sparse • Theoretical results • We can achieve an equivalent approximation performance to using the M most significant coefficients for an signal/image (in a sparse domain) by a fixed number of non-adaptive linear observations as long as: • No. of observations ~ M x log N (N, the size of full signal) and • for almost all (random) observations and • can be achieved with practical reconstruction algorithms

4 3 2 1 Note that 1 is a redundant representation of 2 Wavelet image Invert transform Wavelet image roughly equivalent Sparsifying transform Nonlinear Reconstruction X = original “Tom” sparse “Tom” Observed data Compressed sensing principle

Sub-sampled Fourier Transform ≈ 7 x down sampled (no longer invertible) Haar Wavelet Transform Spatial Fourier Transform Sparse in this domain Logan-Shepp phantom Logan-Shepp phantom …but we wish to sample here We sample in this domain Logan-Shepp phantom Compressed sensing in practice • Compressed Sensing ideas can be applied to reduced sampling in Magnetic Resonance Imaging: • MRI samples lines of spatial frequency • Each line takes time and energy & heats up the patient! • The Logan-Shepp phantom image illustrates this:

ACGP achieves perfect reconstruction at ~ 15% PSNR in dB Best linear reconstruction nonlinear reconstruction (ACGP) Ratio of measurements to signal dimension Compressed sensing in practice Using recently developed algorithms ACGP/GP (Blumensath and Davies, “Gradient Pursuits,” 2007) we can achieve the following results:

Compressed sensing in practice However what we really want to do is… Linear reconstruction (4x under-sampled) Nonlinear reconstruction (4x under-sampled) original (data courtesy of Ian Marshall & Terry Tao, SFC Brain Imaging Centre) …but this reconstruction took us a very long time!

Compressed sensing applications • Compressed Sensing provides a new way of thinking about signal acquisition. • Applications areas include: • Medical imaging • Distributed sensing • Remote sensing • Very fast analogue to digital conversion • (DARPA A2I research program) • Still many unanswered questions… Coding efficiency? Restricted observation domains? Etc.

Summary • Sparse representations provide a powerful mathematical model for many natural signals in signal processing and are a basis for: • good compression; • good source separation and; • efficient sampling • There is an interesting interplay between linear representations and nonlinear approximation • Compressed sensing is only in its infancy…

Final Thought In compressed sensing we have finally learnt that taking signals to bits is easy… its putting the bits back together again that’s difficult.

“Taking Signals to Bits” Sparse Representations for Coding, Sampling and Source Separation