170 likes | 284 Vues
Chapter 5 Part II. 5.3 Spread of Data 5.4 Fisher Discriminant. Measuring the spread of data. Covariance of two random variables, x and y Expectation of their product x , y need to be standardized if they use different units of measurement. Correlation.
E N D
Chapter 5Part II 5.3 Spread of Data 5.4 Fisher Discriminant
Measuring the spread of data • Covariance of two random variables, x and y • Expectation of their product • x, y need to be standardized if they use different units of measurement
Correlation • Covariance of x and y measure correlation • This treats coordinates independently • In kernel-induced feature space we don’t have access to the coordinates
Spread in the Feature Space • Consider l × N matrix X • Assume zero mean, then covariance matrix C
Spread in the Feature Space • Observe • Consider (unit vector) then the value of the projection is:
Spread in the Feature Space • Variance of the norms of the projections onto v • Where
Spread in the Feature Space • So the covariance matrix contains everything needed to calculate the variance of data along any projection direction • If the data is not centered, suntract square of mean projection
Variance of Projections • Variance of projections onto fixed direction v in feature space using only inner product • v is a linear combination of training points • Then:
Now that we can compute the variance of projections in feature space we can implement a linear classifier • The Fisher discriminant
Fisher Discriminant • Classification function: • Where w is chosen to maximize
Regularized Fisher discriminant • Choose w to solve • Quotient is invariant to rescalings of w • Use fixed value C for denominator • Using a Lagrange multiplier v, the solution is
Regularized Fisher discriminant • We then have • Where • y is vector of labels {-1, +1} • I+ (I-) is identity matrix with only positive (negative) columns containing 1s • j+ (j-) all-1s vector, similar to I+ (I-)
Regularized Fisher discriminant • Furthermore, let • Where D is a diagonal matrix • And where C+, C- are given by
Regularized Fisher discriminant • Then • With appropriate redefinitions of v, λ and C • Taking derivatives with respect to w produces
Dual expression of w • We can express w in feature space as a linear combination of training samples w=X’α, with • Substituting w produces • Giving • This is invariant to rescalings of w, so we can rescale α by v to obtain
Regularized kernel Fisher discriminant • Solution given by • Classification function is • Where k is the vector with entries k(x,xi), i=1,…,l • And b is chosen so that w’μ+-b = b-w’μ-
Regularized kernel Fisher discriminant • Taking w=X’α, we have • where