Chapter 5 Part II

Chapter 5Part II 5.3 Spread of Data 5.4 Fisher Discriminant

Measuring the spread of data • Covariance of two random variables, x and y • Expectation of their product • x, y need to be standardized if they use different units of measurement

Correlation • Covariance of x and y measure correlation • This treats coordinates independently • In kernel-induced feature space we don’t have access to the coordinates

Spread in the Feature Space • Consider l × N matrix X • Assume zero mean, then covariance matrix C

Spread in the Feature Space • Observe • Consider (unit vector) then the value of the projection is:

Spread in the Feature Space • Variance of the norms of the projections onto v • Where

Spread in the Feature Space • So the covariance matrix contains everything needed to calculate the variance of data along any projection direction • If the data is not centered, suntract square of mean projection

Variance of Projections • Variance of projections onto fixed direction v in feature space using only inner product • v is a linear combination of training points • Then:

Now that we can compute the variance of projections in feature space we can implement a linear classifier • The Fisher discriminant

Fisher Discriminant • Classification function: • Where w is chosen to maximize

Regularized Fisher discriminant • Choose w to solve • Quotient is invariant to rescalings of w • Use fixed value C for denominator • Using a Lagrange multiplier v, the solution is

Regularized Fisher discriminant • We then have • Where • y is vector of labels {-1, +1} • I+ (I-) is identity matrix with only positive (negative) columns containing 1s • j+ (j-) all-1s vector, similar to I+ (I-)

Regularized Fisher discriminant • Furthermore, let • Where D is a diagonal matrix • And where C+, C- are given by

Regularized Fisher discriminant • Then • With appropriate redefinitions of v, λ and C • Taking derivatives with respect to w produces

Dual expression of w • We can express w in feature space as a linear combination of training samples w=X’α, with • Substituting w produces • Giving • This is invariant to rescalings of w, so we can rescale α by v to obtain

Regularized kernel Fisher discriminant • Solution given by • Classification function is • Where k is the vector with entries k(x,xi), i=1,…,l • And b is chosen so that w’μ+-b = b-w’μ-

Regularized kernel Fisher discriminant • Taking w=X’α, we have • where

Chapter 5 Part II

Chapter 5 Part II

Presentation Transcript

CHAPTER 11 PART II

Chapter 2 Part II

Learning Chapter 5 Part II

Chapter 8, Part II

Chapter 3 – Part II

CHAPTER 37 – Part II

Chapter 13: Part II

PART II: Chapter 2

PART II: Chapter 2

Part II Chapter 5

Chapter 5: CPU Scheduling part II

Chapter 15, Part II

PART II: Chapter 2

Part II Chapter 4

Chapter 16.2 Part II

Part II Chapter 5

Chapter 10 Part II

Chapter 15, Part II

Chapter 9 Part II

Chapter 5 Implementing UML Specification (Part II)

Learning Chapter 5 Part II

Week 5 Part II