Independent Component Analysis

Independent Component Analysis Lecture 12

What is ICA and why is it used? • Extract useful information from large amounts of data: • Extract features from brain images • Extract independent voices from mixture of voices • Extract variables that influence stock prices • Separate data into underlying components given very little information about the nature of the components - BLIND SOURCE SEPARATION

ICA & Signals Source A Mixture 1 Source A I C A Source B Mixture 2 Source B

Code for previous example %create source 1 v=[0:100]; s1 = sin(v/2); %sinusoidal plot(s1) %create source 2 s2 = ((rem(v,23)-11)/9).^5 %funny curve figure, plot(s2) %generate the mixtures. The number of mixtures = nos. of sources mix1 = 0.7*s1 + 0.3*s2; mix2 = 0.8*s1 + 0.3*s2; figure, plot(mix1) figure, plot(mix2) %create a single array from those mixtures mixedsig=[ ]; mixedsig(1,:) = mix1; mixedsig(2,:) = mix2; %Now use fast ICA to seperate the misture into its components [sources,A,W] = fastica(mixedsig); %extract back the original sources news1 = sources(1,:); news2 = sources(2,:); figure, plot(news1) figure, plot(news2)

ICA & Sound Sound Source 1 Mixture 1 I C A Output 1 Sound Source 2 Mixture 2 Output 2 Output 3 Mixture 2 Sound Source 3

play(wav3);pause%create a single array from those mixturesmixedsig=[ ];mixedsig(1,:) = mix1;mixedsig(2,:) = mix2;mixedsig(3,:) = mix3;%use fast ICA to separate the voice components[sources,A,W] = fastica(mixedsig);%extract back the original sourcessource1 = sources(1,:);source2 = sources(2,:);source3 = sources(3,:);%play the sourceswav1 = audioplayer(source1, fs1);wav2 = audioplayer(source2, fs2);wav3 = audioplayer(source3, fs3);play(wav1);pauseplay(wav2);pauseplay(wav3); %read in the source files [sc1,fs1,nBits1] = wavread('s1.wav'); [sc2,fs2,nBits2] = wavread('s2.wav'); [sc3,fs3,nBits3] = wavread('s3.wav'); wav1 = audioplayer(sc1, fs1); wav2 = audioplayer(sc2, fs2); wav3 = audioplayer(sc3, fs1); play(wav1); pause play(wav2); pause play(wav3); pause %generate the mixtures. The number of mixtures = %nos. of sources mix1 = 0.2*sc1 + 0.3*sc2 + 0.5*sc3; mix2 = 0.5*sc1 + 0.3*sc2 + 0.2*sc3; mix3 = 0.2*sc1 + 0.5*sc2 + 0.3*sc3; %play the mixtures wav1 = audioplayer(mix1, fs1); wav2 = audioplayer(mix2, fs2); wav3 = audioplayer(mix3, fs3); play(wav1); pause play(wav2); pause

Results & Assumptions • The output separated signals may be inverted (i.e. their sign changed) • The output signal amplitude may also be different from that of the original amplitude • For sound samples: input are assumed to be sampled at the same time i.e. there is no differential delay between the samples (although this may be a problem in practice since the sound samples may arrive with a slight delay at the different recording devices) • The source signal is clear and without any background noise. • There must be at least as many different mixtures of a set of source signals as there are source signals. We will see the reasons for the above effects and assumptions later

The principle behind ICA: INDEPENDENCE • We observe the following from the previous example: • the amplitudes of one sound at a given time is unrelated to the amplitude of another sound at the same time since they are generated by two unrelated processes Source 1 Source 2 scattergram

The principle behind ICA: INDEPENDENCE The scattergram is a distribution that shows the amplitude of one of the source signals at each time plotted against the amplitude of the other signal. This resultant distribution does not indicate any obvious pattern thereby implying that the two sources are independent. % example to show % independence v= [0:100]; w1 = sin(v); % sinusoidal w2 =((rem(v,27)-13)/9) % saw-tooth plot(w1) xlabel('Time'); ylabel('Amplitude'); figure, plot(w2) xlabel('Time'); ylabel('Amplitude'); % plot scatterplot figure, scatter(w1,w2); xlabel(‘Source1 Amplitude'); ylabel(‘Source2 Amplitude'); text(5,5,'Scatter Plot');

since the sounds are unrelated, try and extract unrelated time-varying signals from the mixtures • The extracted signals should be the original sound signals • If different signals are from different sources, then those signals are statically independent • Reversing this assumption: if statically independent signals are extracted from the mixtures, then these extracted signals must be from different (independent) sources (e.g. two different people speaking) • This is the principle behind extracting signals IN Blind Source Separation using ICA

ICA vs. PCA & FA • Principle Component Analysis (PCA) and Factor Analysis (FA) are conventional methods used for analyzing data • PCA and FA find signals which are uncorrelated to each other in comparison to ICA which finds signals independent of each other. • Independence implies a lack of correlation but the reverse is not true • Hence PCA and FA would give as outputs a set of new voice mixtures which are uncorrelated but not necessarily independent

Properties of signal mixtures INDEPENDENCE • In order to extract the original signals from the signal mixtures it is important to first understand the properties of signal mixtures • Where the source signals are statically independent (scattergram) their signal mixtures are not • In the graph below when the amplitude of one of the signal mixtures increases, the other mixture amplitude also increases

Source 1 Source 2 Source scattergram The source signal distribution does not indicate any obvious pattern thereby implying that the two sources are independent Source 1 Source 2 Mixture scattergram In contrast the signal mixture distribution on the right shows that the mixtures are correlated

% example to show % independence v= [0:100]; w1 = sin(v); % sinusoidal w2 =((rem(v,27)-13)/9); % saw-tooth plot(w1) xlabel('Time'); ylabel('Amplitude'); figure, plot(w2) xlabel('Time'); ylabel('Amplitude'); % plot scatterplot figure, scatter(w1,w2); xlabel('Source1 Amplitude'); ylabel('Source2 Amplitude'); text(5,5,'Scatter Plot'); % generate signal mixtures signal mixtures mix1 = 0.7*w1 + 0.3*w2; mix2 = 0.8*w1 + 0.3*w2; % plot the mixtures figure, plot(mix1) xlabel('Time'); ylabel('Amplitude'); figure, plot(mix2) xlabel('Time'); ylabel('Amplitude'); %plot the scatterplot of signal mixtures figure, scatter(mix1,mix2); xlabel('Mixture1 Amplitude'); ylabel('Mixture2 Amplitude'); text(5,5,'Scatter Plot');

Properties of signal mixtures: NORMALITY • The histogram of a source signal is usually either ‘peaky’ or ‘flat, depending on the source signal amplitude variations. Signal 1 Signal 2 Histogram 1 Histogram 2

Any mixture of source signals has a histogram that tends to be more bell-shaped (normal or Gaussian) than that of any of its constituent source signals, even if the source signals have very different histograms Signal Mixture Histogram of Mixture

%HISTOGRAMS TO SHOW NORMALITY v=[0:100]; w1=sin(v); % sinusoidal w2=((rem(v,27)-13)/9); % saw-tooth plot(w1) xlabel('Time'); ylabel('Amplitude'); % histogram of the first signal figure, hist(w1,20) xlabel('Amplitude'); ylabel('Count'); figure, plot(w2) xlabel('Time'); ylabel('Amplitude'); % histogram of the second signal figure, hist(w2,20) xlabel('Amplitude'); ylabel('Count'); % generate signal mixture w3 = w1+w2; figure, plot(w3) xlabel('Time'); ylabel('Amplitude'); % histogram of the signal mixture figure, hist(w3,20) xlabel('Amplitude'); ylabel('Count');

Properties of signal mixtures: COMPLEXITY • The temporal complexity of any mixture is greater than (or equal to) that of its simplest (least complex) constituent source signal • Hence extracting the least the least complex from a set of signal mixtures yields a source signal + =

Properties of signal mixtures: COMPLEXITY Complexity of the mixture signal obtained by the addition of a sinusoidal and saw toothed signal is greater than the complexity of the individual constituents % example to show % complexity v=[0:100]; w1=sin(v); % sinusoidal w2=((rem(v,27)-13)/9); %saw-tooth plot(w1) xlabel('Time'); figure, plot(w3) xlabel('Time'); ylabel('Amplitude'); ylabel('Amplitude'); figure, plot(w2) xlabel('Time'); ylabel('Amplitude'); %signal mixture w3 = w1+w2; % plot mixture

UNMIXING SIGNALS • Using the 3 properties of signal mixtures one may extract the source signals as follows: • Independence: extracting independent signals from the set of signal mixtures should recover the required source signals • Normality: extracting signals with non-Gaussian histograms from the mixture will recover the source signals • Complexity: extracting signals with low complexity recovers the source signals

UNMIXING SIGNALS Hence if the input signals are independent, are non-Gaussian or have low complexity, we could utilize these properties to extract the source signals from their mixtures We will see later that knowing the difference between the properties of source and mixture signal will be crucial in the process of extraction of source signals

All 50,000 samples of a speech signal Mixing & Unmixing Matrices • A time-varying source signal is represented by the following scalar variable: s = (s1, s2, s3, …, sN) • s1, s2, etc. represent the amplitude of the signal at time 1,2 … N milliseconds respectively • The amplitude could vary in time (as above) or in space (e.g. in TV signals where the intensity values of pixels vary from point to point) First 10,000 samples of the same speech signal

Mixing & Unmixing Matrices • If we had two time-varying signals s1 and s2 as follows: s1 = (s11, s12, s13, …, s1N) s2 = (s21, s22, s23, …, s2N) • The amplitude of the first signal during the 3rd millisecond is represented as s13, similarly for the second signal during the 3rd millisecond is s23 • Together they may be represented by a vector variable s3 as follows: s13 s23 s3 = = (s13, s23)T

Mixing & Unmixing Matrices S3 represents the pair of amplitudes during the third millisecond. The amplitude of both signals over N milliseconds may be represented by a vector s as follows: Each s is a scalar s1 s2 (s11, s12, s13, …, s1N) (s21, s22, s23, …, s2N) = s = s = (s1, s2, s3, …, sN) where each ‘s’ represents a vector

Orthogonal Projection 1. Each plotted point represents the amplitudes of both source signals at one point in time • A graphical representation of the two source signal amplitudes may be plotted as follows: 2. The amplitude of each signal corresponding to each data point may be obtained by drawing a line from that point onto any each of the axis 3. Amplitudes of s1 can be obtained by drawing a line from each data point onto the horizontal axis S1. Similarly s2 can be obtained.

Orthogonal Projection Note that these lines drawn are at right angles to the desired axis. 2 lines at right angles are said to be ORTHOGONAL The amplitudes obtained by drawing an orthogonal line from one data point to an axis is the ORTHOGONAL PROJECTION of that data point onto that axis

Mixing Signals • Consider two speech signals s1 and s2. • When two speech signals are recorded by a single microphone, the output is a signal mixture which is sum of the two signals • Let this mixture be x1 • Now the relative proportion of each signal in x1 depends on the loudness of each source signal at its source and its relative distance to the microphone

Mixing Signals • Thus x1t at time t is the weighted sum of the two source signals s1t and s2t at that time x1t = a x s1t + b x s2t • Where a and b are the mixing coefficients (x11,x12, …, x1N) = a x (s11,s12,…,s1N) + b x (s21,s22,…,s2N) x1 = a x s1 + b x s2 x1 = as1 + bs2

Mixing Signals • Similarly at the second microphone we obtain the mixture output x2 x2 = cs1 + ds2 • Now x1 and x2 can be represented by a vector x = (x1, x2)T • The MIXING PROCESS, represented by the mixing coefficients (a, b, c, d) therefore transforms one vector variable s to another vector variable x In other words each source signal data point st = (s1t, s2t)T at a given time t is transformed to a corresponding signal mixture data point xt = (x1t, x2t)T, denoted as st xt

Mixing Signals st xt

Unmixing Signals • In the manner that signals were combined to obtain mixtures using mixing coefficients, one can recover each source signal from these mixtures by recombining the signal mixtures using unmixing coefficients (α, β, γ, δ) s1 = αx1 + βx2 s2 = γx1 + δx2 • BSS finds values for these unmixing coefficients in order to recover the source signals • IN OTHER WORDS, THIS CONSISTS OF FINDING THE SPATIAL TRANSFORMATION THAT MAPS A SET OF MIXTURES TO A SET OF SOURCE SIGNALS

Linear Transformation • Let the space defined by the source signal axes S1 and S2 be S and the space defined by the mixture axes X1 and X2 be X

Linear Transformation • Now the particular transformation induced by the mixing process maps parallel lines in S to parallel lines in X. This is know as linear transformation. • The orientations and lengths are usually altered during the transformation. These values are not random but are determined by the mixing coefficients (a, b, c, d)

Sources as vectors • Linear transformations • Mixing co-efficients • Sources • Microphones • Mixtures • Proportions of sources

Unmixing Vector • We know now that a source signal s1 can be extracted from a pair of mixtures x = (x1, x2)T using a pair of unmixing coefficients (α, β) to recombine the mixtures x s1 = αx1 + βx2 • coefficients (α, β) may be represented as a weight vector w1 = (α, β)T w1 = (α, β)T defines a point with co-ordinates (α, β). The length of |w1| of the vector is the distance of the point at (α, β) from the origin. The distance is given by the length of the hypotenuse of the triangle with sides α and β 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 |w1| = (α2 + β2)1/2 α w1 ψ Signal Mixture 2 amplitude β -4 -2 0 2 4 Signal Mixture 1 amplitude

Weight Vector & Extraction How does this weight vector w1 help in extracting a source signal s1 from a pair of mixtures x? • w1 has 2 properties: - it can change in length given by |w1| = (α2 + β2)1/2 - it can change its orientation ψ How does length help in extraction? • Let us change the length of w1 by a factor λ λ|w1| = λ(α2 + β2)1/2 = ((λα)2 + (λβ)2)1/2 Given w1 = (α, β)T, λw1 = ((λα), (λβ))T Now λ|w1|xt = (λα)x1t + (λβ)x2t = λ(αx1t + βx2t) = λs1t

Weight Vector & Extraction The length of w1 therefore affects the amplitude of the extracted signal, • but does not affect the nature of the signal. Thus, the extracted signal is a louder or attenuated version of s1 Depending on the length of w1 RECALL RESULTS The output signal amplitude may also be different from that of the original amplitude

Weight Vector & Extraction Since length of the weight vector does not help reveal the nature of any of the source, orientation must be the property that helps extract a source signal…

Inner Product RECALL RESULTS There must be at least as many different mixtures of a set of source signals as there are source signals How does orientation help in extraction? THE INNER PRODUCT: • We know that w1 = (α, β)T • Rewritten as w1T = (α, β) • We also know that s1t= αx1t+ βx2t • Rewritten as s1t= w1Txt (Note: T – transpose, t – time index) Above is the dot product of two vectors also known as scalar product or inner product. There must be as many columns in the weight vector as there are rows in the mixture vector. The inner product of w1and xt is actually given by: s1t= |xt||w1|cosθ

Inner Product • The result of an inner product is a scalar, which in this case is given by st • st over N time steps is then given by (s11,s12, …, s1N) = (α, β) x11,x12, …, x1N x21,x22, …, x2N = α x1 +βx2 = w1Tx L.H.S is the source signal s1, and hence s1 = w1Tx What about cosθ in s1t= |xt||w1|cosθ? If θ = 90 degrees, then cosθ = 0. Thus ifw1T and x are orthogonal (at 90 degrees to each other), then their inner product is zero.

Mixing & Unmixing Matrix • Since s1 = w1Tx, similarly s2 = w2Tx where w1 = (α, β) and w2 = (γ, δ) MAPPING s x s11,s12, …, s1N = α, βx11,x12, …, x1N s11,s12, …, s1Nγ, δ x21,x22, …, x2N (s1, s2)T = (w1, w2)T(x1,x2) s = Wx Where W is the unmixing matrix

Mixing & Unmixing Matrix • Now mixture x1 is a combination of source signals, and the mixing co-efficients define a vector v1 = (a, b)T: x1 = as1 + bs2 = (a, b)(s1, s2)T = v1Ts Andx2 = cs1 + ds2 = (c, d)(s1, s2)T = v2Ts

Mixing & Unmixing Matrix • Therefore x11,x12, …, x1N = a b s11,s12, …, s1N x21,x22, …, x2N c d s21,s22, …, s2N (x1, x2)T = (v1, v2)T(s1,s2) x = As Where A is the mixing matrix It can be seen that W reverses, or inverts, the effects of A, and hence W could be estimated from A-1. However in ICA, nothing more than the mixtures is known and therefore finding A-1 is not of importance to us. The point to be made here is that W and A reverse the effects of each other.

Recall Linear Transformation • Recall the space defined by the source signal axes S1 and S2 be S and the space defined by the mixture axes X1 and X2 be X

Recall Linear Transformation • Clearly, the point (0, 1) lies in the axis S2 in S. Now if we apply the Mixing Matrix A to this point we get the transformed axis S2’ in X S2’ = AS2 = a b 0 c d 1 = b d

Axis Transformation • Similarly S1’ = a b 1 = a c d 0 c Now A can be rewritten as A = a b = (S1’,S2’) c d Critically, changes in the value of source signal s1 induce movement of data points along lines parallel to S1 in S, and these same changes also induce movement of data points along lines parallel to the transformed axis S1’ in X.

Extracting one Source Signal from Two Mixtures • Consider s1 = w1Tx s1 = w1T(As) (i.e. x = As) s1 = w1T(S1’, S2’) (s1, s2) (i.e. A = (S1’,S2’)) Recall that the inner product of two vectors is zero if the vectors are orthogonal. If w1T = (α, β) is orthogonal to the transformed axis S2’ = (b, d)T, then: w1TS2’ = 0 S2’ w1

Extracting one Source Signal from Two Mixtures • At the same time, the vectors S1’ and w1T are not orthogonal and hence their inner product is: w1TS2’ = |S1’||w1|cosθ = k where θ is the angle between S1’ and w1 This value of k does not depend on the which data point xt is considered since xt never appears in the above equation • Now s1 = (k, 0)(s1, s2)T = ks1 + 0s2 = ks1

Extracting one Source Signal from Two Mixtures Thus, a SCALED VERSION ks1 of the source signal s1 is extracted from the mixture x by taking the inner product of each mixture data point xt with a vector w1TTHAT IS ORTHOGONAL TO THE TRANSFORMED AXIS S2’ in X.

Extracting one Source Signal from Two Mixtures If k =1, we get the exact source signal, however the fact that s1 is scaled by an unknown constant factor k implies that we may not be able to recover the exact amplitude of each source signal THIS HELPS EXPLAIN POINT 2 on SLIDE 7: The output signal amplitude may also be different from that of the original amplitude

Independent Component Analysis