1 / 17

Agnostically learning halfspaces

Agnostically learning halfspaces. FOCS 2005. . . w.h.p. h: X! {0,1}. poly(1/  ) samples. P [h(x)  y] · opt + . P [f*(x)  y]. arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x)  y]. L. Sellie. Agnostic learning. Set X , F class of functions f: X! {0,1}. .

johana
Télécharger la présentation

Agnostically learning halfspaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agnostically learning halfspaces FOCS 2005

  2.  w.h.p. h: X!{0,1} poly(1/) samples P [h(x)y] · opt +  P[f*(x)y] arbitrary dist. over (x,y) 2X £ {0,1} f* = argminf2F P [f(x)y] L. Sellie Agnostic learning Set X, F class of functionsf: X!{0,1}. Efficient Agnostic Learner

  3. w.h.p. h: Xn!{0,1} P [h(x)y] · opt +  P[f*(x)y] arbitrary dist. over (x,y) 2X £ {0,1} f* = argminf2F P [f(x)y] L. Sellie Agnostic learning Set XnµRn, Fn class of functionsf: Xn!{0,1}. n Efficient Agnostic Learner  poly(n,1/) samples

  4. w.h.p. h: Xn!{0,1} P[f*(x)y] arbitrary dist. over (x,y) 2X £ {0,1} f* = argminf2F P [f(x)y] L. Sellie Agnostic learning Set XnµRn, Fn class of functionsf: Xn!{0,1}. n Efficient Agnostic Learner  poly(n,1/) samples P [h(x)y] · opt +  in PAC model, P [f*(x)y] = 0

  5. P[f*(x)y] h f* argminf2F P[f(x)y] Agnostic learning of halfspaces Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt + 

  6. P[f*(x)y] h f* Agnostic learning of halfspaces Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt +  Special case: junctions, e.g.,f(x) = x1 Ç x3 = I(x1 + x3 ¸ 1) • Efficient agnostic-learn junctions ) PAC-learn DNF • NP-hard to properly agnostic learn

  7. P[f*(x)y] f* Agnostic learning of halfspaces PAC learning halfspaces solved by LP Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt + 

  8. P[f*(x)y] h f* Agnostic learning of halfspaces PAC learning halfspaces with indep./random noise solved by: Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt + 

  9. h f* Agnostic learning of halfspaces Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt +  minf2FnP[f(x)y] Equivalently, f*=“truth” with adversarial noise

  10. nO(-4) Theorem 1: (w.h.p.) Our alg. outputs h: Rn!{0,1} with P[h(x)  y] · opt + , in time poly(n) (8 const >0), as long as draws x 2 Rn from: • Log-concave distribution, e.g.: uniform over convex set, exponential e-|x|, normal • Uniform over {-1,1}nor Sn-1={x2Rn| |x|=1} • …

  11. 2. Low-degree Fourier algorithm of • Chose , where • Outputh(x) = I(p(x)¸½) time nO(d) 1. L1polynomial regression algorithm ¼ minimizedeg(p)·d E [|p(x)-y|] • Given: d>0,(x1,y1),…,(xm,ym) 2Rn£ {0,1} • Find deg-d p(x) to minimize: • Pick 2 [0,1] at random, output h(x) = I(p(x)¸) multivariate time nO(d) ¼ minimizedeg(p)·d E [(p(x)-y)2] (requires x uniform from {-1,1}n) y x

  12. ·p lemma of : alg’s error· ½ - (½ - opt)2 + & Sellie 1. L1polynomial regression algorithm ¼ minimizedeg(p)·d E [|p(x)-y|] • Given: d>0,(x1,y1),…,(xm,ym) 2Rn£ {0,1} • Find deg-d p(x) to minimize: • Pick 2 [0,1] at random, output h(x) = I(p(x)¸) multivariate lemma: alg’s error · opt + mindeg(q)·dE [|f*(x)-q(x)|] 2. Low-degree Fourier algorithm of • Chose , where • Outputh(x) = I(p(x)¸½) ¼ minimizedeg(p)·d E [(p(x)-y)2] (requires x uniform from {-1,1}n) time nO(d) lemma: alg’s error·8(opt + mindeg(q)·dE [(f*(x)-q(x))2]) = e y x

  13. Useful properties of logconcave dist’s: projection is logconcave, …, Approx degree is dimension-free for halfspaces q(x) ¼I(x ¸ 0) degree d=10 q(w¢x) ¼I(w¢x¸0) degree d=10

  14. Hey, I’ve used Hermite (pronounced air-meet) polynomials many times. Approximating I(x ¸) (1 dimension) • Bound mindeg(q)·dE[(q(x) – I(x ¸))2] • Continuous distributions: orthogonal polynomials • Normal: Hermite polynomials • Logconcave (e-|x|/2 suffices): new polynomials • Uniform on sphere: Gegenbauer polynomials • Uniform on hypercube: Fourier <f,g> = E[f(x)g(x)]

  15. Theorem 2: junctions (e.g., x1Æ x11Æ x17) • For arbitrary over {0,1}n£{0,1} the polynomial regression algorithm with d=O(n1/2log(1/)) (time -O*(n½)) outputs h with P[h(x)y] · opt +  Follows from previous lemmas +

  16. Assume (x,y) = (1-) (x,f*(x)) +  (arbitrary (x,y)): • We get: error · O(n1/4 log(n/))  + using Rankin’s second bound uniform 2 Sn-1 How far can we get in poly(n,1/) time? Assume draws x uniform from: Sn-1 = { x2Rn| |x|=1} • Perceptron algorithm: error · O(pn) opt +  • We show: simple averaging algorithm of achieves error · O(log(1/opt)) opt + 

  17. Half-space conclusions & future work • L1 poly reg: natural extension of Fourier learning • Works for non-uniform/arbitrary distributions • Tolerates agnostic noise • Works on both continuous and discrete problems • Future work • Work on all distributions (not just logconcave/uniform {-1,1}n) • opt +  using poly(n,1/) algorithm (we have poly(n) for fixed , and trivial: poly() for fixed n) • Other interesting classes of functions

More Related