The Forgetron: A Kernel-Based Algorithm for Perceptron Learning on a Budget

Ofer Dekel Shai Shalev-Shwartz Yoram Singer The Forgetron: A kernel-based Perceptron on a fixed budget The Hebrew University Jerusalem, Israel Problem Setting The Forgetron Proof Technique Mistake Bound • Online learning: • Choose an initial hypothesis f0 • For t=1,2,… • Receive an instance xt and predict sign(ft(xt)) • If (yt ft(xt) <= 0) • update the hypothesis f • Goal: minimize the number of prediction mistakes • Kernel-based hypotheses • Example: the dual Perceptron • is always 1 • Initial hypothesis: • Update rule: • Competitive analysis • Compete against any g: • Goal: A provably correct algorithm on a budget: |It| <= B • Initilize: • For t=1,2… • define • Receive an instance xt, predict sign(ft(xt)), and then receive yt • If yt ft(xt) <= 0 set Mt = Mt-1 + 1 and update: • If |I’t|<= B skip the next two steps • define rt = min It • Progress measure: can be rewritten as • The Perceptron update leads to positive progress: • The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ? • A sequence {(x1,y1),…,(xT,yT)} such that K(xt,xt) <= 1 • Budget B >= 84 • Competitor g s.t. • Then, 1 2 3 ... ... t-1 t Deviation due to Shrinking Step (1) - Perceptron Experiments • Case I: the shrinking is projection onto the ball of radius U = ||g||: • Case II: aggressive shrinking define • Compare to Crammer, Kandola, Singer (NIPS’03) • Display online error vs. budget constraint 1 2 3 ... ... t-1 t hinge lossmax { 0 , 1 - ytg(xt) } MNIST USPS Step (2) - Shrinking Deviation due to Removal define • If example (x,y) is activated on round r and deactivated on round t then, Hardness Result synthetic (10% label noise) census-income (adult) 1 2 3 ... ... t-1 t • For any kernel-based algorithm that satisfies |It| <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss. • for each ft based on B examples, exists x in X such that ft(x) = 0 • norm of the competitor is • we cannot compete with hypotheses whose norm larger than Step (3) - Removal Perceptron’s progress Deviation due to removal: 1 2 3 ... ... t-1 t

The Forgetron: A Kernel-Based Algorithm for Perceptron Learning on a Budget

The Forgetron: A Kernel-Based Algorithm for Perceptron Learning on a Budget

Presentation Transcript

Code Compaction of an Operating System Kernel

Linux Device Drivers

Database Systems Kernel

Chapter 6

G-16: The Federal Budget

Learning Pundits on Budget analysis

Windows Kernel Internals Virtual Memory Manager

Chapter 6 Kernel Smoothing Methods

Neural networks

Fixed Income I Problem Solving Session

J C M

Matlab Sigmoid Perceptron Linear Training Small, Round Blue-Cell Tumor Classification Example

Multiple Kernel Learning

Linux Operating System Kernel 許富皓

Rootkits

FIXED ASSETS

Linux Operating System Kernel 許富皓

Kernel Methods in Natural Language Processing

Union budget 2016-17

The Forgetron: A Kernel-Based Algorithm for Perceptron Learning on a Budget