1 / 19

A PAC-Bayes Risk Bound for General Loss Functions

A PAC-Bayes Risk Bound for General Loss Functions. NIPS 2006 Pascal Germain, Alexandre Lacasse, Fran ç ois Laviolette, Mario Marchand Université Laval, Québec, Canada. Summary.

toshi
Télécharger la présentation

A PAC-Bayes Risk Bound for General Loss Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A PAC-Bayes Risk Bound for General Loss Functions NIPS 2006 Pascal Germain, Alexandre Lacasse, François Laviolette, Mario Marchand Université Laval, Québec, Canada

  2. Summary • We provide a (tight) PAC-Bayesian bound for the expected loss of convex combinations of classifiers under a wide class of loss functions • like the exponential loss and the logistic loss. • Experiments with Adaboost indicate that the upper bound (computed on the training set) behaves very similarly as the true loss (estimated on the testing set).

  3. Convex Combinations of Classifiers • Consider any set H of {-1, +1}-valued classifiers and any posterior Q on H . • For any input example x, the [-1,+1]-valued output fQ(x) of a convex combination of classifiers is given by

  4. The Margin and WQ(x,y) • WQ(x,y) is the fraction, under measure Q, of classifiers that err on example (x,y) • It is relate to the margin y fQ(x) by

  5. General Loss Functions Q(x,y) • Hence, we consider any loss function Q(x,y) that can be written as a Taylor series • and our task is to provide tight bounds for the expected loss Q that depend on the empirical loss measured on a training set of m examples, where

  6. Bounds for the Majority Vote • A bound on Q also provides a bound on the majority vote since

  7. A PAC-Bayes Bound on Q

  8. Proof • where h1-k denotes the product of k classifiers. Hence

  9. Proof (cnt.) • Let us define the “error rate” R(h1-k ) as • to relate Q to the error rate of a new Gibbs classifier:

  10. Proof (ctn.) • Where is a distribution over products of classifiers that works as follows: • A number k is chosen according to • k classifiers in H are chosen according to Qk • So denotes the risk of this Gibbs classifier:

  11. Proof (ctn.) • The standard PAC-Bayes theorem implies that for any prior on H* = [k2N+Hk , we have • Our theorem follows for any having the same structure of (i.e: k is first chosen according to |g(k)|/c, then k classifiers are chosen accord. to Pk) since, in that case, we have

  12. Remark • Since we have • any looseness in the bound for R(GQ) will be amplified by c on the bound for Q. • Hence, the bound on Q can be tight only for small c. • This is the case for Q(x,y) = |fQ(x) – y|r since we have c = 1 for r = 1 and c = 3 for r = 2.

  13. Bound Behavior During Adaboost • Here H is the set of decision stumps. The output h(x) of decision stump h on attribute x with threshold t is given by h(x) =  sgn(x-t) . • If P(h) = 1/|H| hH, then • H(Q) generally increases at each boosting round

  14. Results for the Exponential Loss • For this loss function, we have • Since c increases exponentially rapidly with , so will the risk bound.

  15. Exponential Loss Results (ctn.)

  16. Exponential Loss Results (ctn.)

  17. Results for the Sigmoid Loss • For this loss function, we have • The Taylor series for tanh(x) converges only for |x| < /2. We are thus limited to  < /2.

  18. Sigmoid Loss Results (ctn.)

  19. Conclusion • We have obtained PAC-Bayesian risk bounds for any loss function Q having a convergent Taylor expansion around WQ = ½. • The bound is tight only for small c. • On Adaboost, the loss bound is basically parallel to the true loss.

More Related