1 / 15

Model Selection

Model Selection. Outline. Motivation Overfitting Structural Risk Minimization Cross Validation Minimum Description Length. Motivation:. Suppose we have a class of infinite Vcdim We have too few examples How can we find the best hypothesis Alternatively,

Télécharger la présentation

Model Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Selection

  2. Outline • Motivation • Overfitting • Structural Risk Minimization • Cross Validation • Minimum Description Length

  3. Motivation: • Suppose we have a class of infinite Vcdim • We have too few examples • How can we find the best hypothesis • Alternatively, • Usually we choose the hypothesis class • How should we go about doing it?

  4. Overfitting • Concept class: Intervals on a line • Can classify any training set • Zero training error: The only goal?!

  5. Overfitting: Intervals • Can always get zero error • Are we interested?! • Recall Occam Razor!

  6. Overfitting: Intervals

  7. Overfitting • Simple concept plus noise • A very complex concept • insufficient number of examples + noise 1/3

  8. Theoretical Model • Nested Hypothesis classes • H1H2H3 …  Hi • Let VC-dim(Hi)=I • For simplicity |Hi| = 2i • There is a target function c(x), • For some i, c Hi • e(h) = Pr [ h  c] • ei = minhHi e(h) • e* = miniei

  9. Theoretical Model • Training error • obs(h) = Pr [ h  c] • obsi = minhHi obs(h) • Complexity of h • d(h) = mini {h Hi} • Add a penalty for d(h) • minimize: obs(h)+penalty(h)

  10. Structural Risk Minimization • Penalty based. • Chose the hypothesis which minimizes: • obs(h)+penalty(h) • SRM penalty:

  11. SRM: Performance • THEOROM • With probability 1- • h* : best hypothesis • g* : SRM choice • e(h*) e(g*) e(h*)+ 2 penalty(h*) • Claim: The theorem is “tight” • Hiincludes 2i coins

  12. Proof • Bounding the error in Hi • Bounding the error across Hi

  13. Cross Validation • Separate sample to training and selection. • Using the training • Select from each Hi a candidate gi • Using the selection sample • select between g1, … ,gm • The split size • (1-)m training set • m selection set

  14. Cross Validation: Performance • Errors • ecv(m), eA(m) • Theorem: with probability 1- • Is CV always near-optimal ?!

  15. Minimum Description length • Penalty: size of h • Related to MAP • size of h: log(Pr[h]) • errors: log(Pr[D|h])

More Related