20 likes | 118 Vues
The Forgetron is a kernel-based Perceptron algorithm designed to minimize prediction mistakes within a fixed budget constraint. It employs an update rule and competitive analysis against a competitor g to ensure correct algorithm behavior. The algorithm's progress is measured in terms of shrinking and removal steps, with experiments conducted to quantify the deviation these steps may cause. The algorithm's competitiveness is analyzed against a competitor g, highlighting methods to minimize negative progress. The Forgetron's performance is validated through online error analysis and hinge loss assessment on datasets like MNIST and USPS. It also demonstrates hardness results with synthetic and real-world datasets, showcasing its capability against competitors with larger norm hypotheses.
E N D
Ofer Dekel Shai Shalev-Shwartz Yoram Singer The Forgetron: A kernel-based Perceptron on a fixed budget The Hebrew University Jerusalem, Israel Problem Setting The Forgetron Proof Technique Mistake Bound • Online learning: • Choose an initial hypothesis f0 • For t=1,2,… • Receive an instance xt and predict sign(ft(xt)) • If (yt ft(xt) <= 0) • update the hypothesis f • Goal: minimize the number of prediction mistakes • Kernel-based hypotheses • Example: the dual Perceptron • is always 1 • Initial hypothesis: • Update rule: • Competitive analysis • Compete against any g: • Goal: A provably correct algorithm on a budget: |It| <= B • Initilize: • For t=1,2… • define • Receive an instance xt, predict sign(ft(xt)), and then receive yt • If yt ft(xt) <= 0 set Mt = Mt-1 + 1 and update: • If |I’t|<= B skip the next two steps • define rt = min It • Progress measure: can be rewritten as • The Perceptron update leads to positive progress: • The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ? • A sequence {(x1,y1),…,(xT,yT)} such that K(xt,xt) <= 1 • Budget B >= 84 • Competitor g s.t. • Then, 1 2 3 ... ... t-1 t Deviation due to Shrinking Step (1) - Perceptron Experiments • Case I: the shrinking is projection onto the ball of radius U = ||g||: • Case II: aggressive shrinking define • Compare to Crammer, Kandola, Singer (NIPS’03) • Display online error vs. budget constraint 1 2 3 ... ... t-1 t hinge lossmax { 0 , 1 - ytg(xt) } MNIST USPS Step (2) - Shrinking Deviation due to Removal define • If example (x,y) is activated on round r and deactivated on round t then, Hardness Result synthetic (10% label noise) census-income (adult) 1 2 3 ... ... t-1 t • For any kernel-based algorithm that satisfies |It| <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss. • for each ft based on B examples, exists x in X such that ft(x) = 0 • norm of the competitor is • we cannot compete with hypotheses whose norm larger than Step (3) - Removal Perceptron’s progress Deviation due to removal: 1 2 3 ... ... t-1 t