170 likes | 297 Vues
This chapter covers essential concepts related to the spread of data, specifically focusing on covariance, correlation, and their implications in feature space. It discusses the significance of standardizing different units of measurement for random variables, the role of covariance in measuring correlation, and how to analyze the variance of projections in feature space. The Fisher discriminant is introduced as a method for classification, emphasizing how to maximize performance with a regularized approach. This provides a comprehensive understanding of data behavior in high-dimensional spaces.
E N D
Chapter 5Part II 5.3 Spread of Data 5.4 Fisher Discriminant
Measuring the spread of data • Covariance of two random variables, x and y • Expectation of their product • x, y need to be standardized if they use different units of measurement
Correlation • Covariance of x and y measure correlation • This treats coordinates independently • In kernel-induced feature space we don’t have access to the coordinates
Spread in the Feature Space • Consider l × N matrix X • Assume zero mean, then covariance matrix C
Spread in the Feature Space • Observe • Consider (unit vector) then the value of the projection is:
Spread in the Feature Space • Variance of the norms of the projections onto v • Where
Spread in the Feature Space • So the covariance matrix contains everything needed to calculate the variance of data along any projection direction • If the data is not centered, suntract square of mean projection
Variance of Projections • Variance of projections onto fixed direction v in feature space using only inner product • v is a linear combination of training points • Then:
Now that we can compute the variance of projections in feature space we can implement a linear classifier • The Fisher discriminant
Fisher Discriminant • Classification function: • Where w is chosen to maximize
Regularized Fisher discriminant • Choose w to solve • Quotient is invariant to rescalings of w • Use fixed value C for denominator • Using a Lagrange multiplier v, the solution is
Regularized Fisher discriminant • We then have • Where • y is vector of labels {-1, +1} • I+ (I-) is identity matrix with only positive (negative) columns containing 1s • j+ (j-) all-1s vector, similar to I+ (I-)
Regularized Fisher discriminant • Furthermore, let • Where D is a diagonal matrix • And where C+, C- are given by
Regularized Fisher discriminant • Then • With appropriate redefinitions of v, λ and C • Taking derivatives with respect to w produces
Dual expression of w • We can express w in feature space as a linear combination of training samples w=X’α, with • Substituting w produces • Giving • This is invariant to rescalings of w, so we can rescale α by v to obtain
Regularized kernel Fisher discriminant • Solution given by • Classification function is • Where k is the vector with entries k(x,xi), i=1,…,l • And b is chosen so that w’μ+-b = b-w’μ-
Regularized kernel Fisher discriminant • Taking w=X’α, we have • where