Efficient learning algorithms for changing environments

Efficient learning algorithms for changing environments Elad Hazan and C. Seshadhri (IBM Almaden)

The online learning setting G1 G2 GT

The online setting f1(x1) f1 x1 x2 f2(x2) f2 xT fT(xT) • Convex bounded functions • Total loss = t ft(xt) • Adversary chooses any function from family fT

Regret f2 f1 fT x* minima for t ft(x) (fixed optimum in hindsight) • Loss of our algorithm = t ft(xt) • Regret = t ft(xt) - t ft(x*) (Standard notion of performance) • Continuum of experts • Online learning problem - design efficient algorithms that attain low regret

Sublinear Regret • We want • Why? • Loss per round converges to optimal • Obviously, can’t compete with best set of points

Portfolio Management • [HKKA] Efficient algorithms that give O(log T) regret (Much smaller than usual O(√T) regret) Loss =

f1 f2 fT x3 x2 x4 Convergence behaviour • As t increases, |xt – xt+1| decreases • As t increases, “learning” decreases…? • Does not adapt to environment x* xT x1

f1 f2 fT/2 fT/2+1 fT Adapting with time • Optimal fixed portfolio is (½, ½): put equal money on both stocks • Low regret algorithms will converge to this • But this is terrible! • We want algorithm to make a switch! • Cannot happen with convergence behaviour f = (1, ½) f = (½,1)

Something better than regret? • [Littlestone-Warmuth, Herbster-Warmuth, Bousquet-Warmuth] study k-shifting optima • Finite expert setting • [Freund-Schapire-Singer-Warmuth] Sleeping experts • [Lehrer, Blum-Mansour] Time selection functions

Adaptive Regret x1 x2 x3 xT J f3 fT f2 f1

Adaptive Regret x1 x2 x3 xT J • Max regret over all intervals • Different optimum x*J for every interval J • Captures movement of optimum as time progresses • We want Adaptive Regret = o(T) • In any interval of size (AR), algorithm converges to optimum f3 fT f2 f1 Adaptive Regret =

Results • We want efficient algorithms to get low Adaptive-Regret for Portfolio Management • Normal regret can be as low as O(log T) • Can we get Adaptive-Regret close to that? • We will deal with a larger class of problems and give general results

FLH • We will describe algorithm Follow-the-Leading-History • It uses standard low-regret algorithms as black box • Bootstrapping procedure - convert low regret into low adaptive regret efficiently • Done by streaming technique

And now for something completely different… • For exp-concave setting (e.g. square loss, portfolio management) – [HKKA]

Other work • [Auer-Cesa Bianchi-Freund-Schapire], [Zinkevich], [Y. Singer] • [Kozat-A. Singer] – independent work in DSP community • k-shifting results for portfolio management • We give more different technique

Study your history! T f3 f2 ft f1 Room of experts HKKA from f2 HKKA from f3 HKKA from ft HKKA from f1 xt

Who to choose? ft • Weight wi for each expert (probabilities) • Choose according to this • After ft is revealed • wi updated with a multiplicative factor, and then mix with uniform distribution HKKA from f1 HKKA from f2 HKKA from f3 HKKA from ft Multiplicative update based on Herbster Warmuth Losses of all experts

Running time problem • Regret in J is O(log T) • Adaptive Regret = O(log T) • But (T) experts needed • Running time = O(RT) since we runs (T) FTLs!! FTL from f2 FTL from f3 FTL from ft FTL from f1 J

Removing experts • Stream through experts • We remove experts • Once removed, they are banished forever • Working set is very dynamic Working set

Working set • St = working set at time t • Subset of [t] • Properties • St+1 \ St = {t+1} • |St| = O(log t) • Well spread out t in St [Woodruff] Elegant deterministic construction: Rule on who to throw out from St to get St+1 t

And therefore… • Working set always of size O(log T) • Running time for each step is only O(R log T) • We get O(log2 T) Adaptive Regret with O(log T) “copies” of original low regret algorithm

To summarize • Defined Adaptive-Regret, a generalization of regret that captures “moving” solutions • Low Adaptive-Regret means we converge to fixed optimum in every interval • Gave bootstrapping algorithm that converts low regret into low Adaptive-Regret (almost optimal) • For (say) portfolio management, what is the right history to look at?

Further directions • Can streaming/sublinear ideas be used for efficiency? • Applications to learning scenarios with cost of shifting • Maybe this technique can be used for online algorithms • Competitive ratio instead of regret • What kind of competitive ratio can these learning techniques give?

Thanks! No, we didn’t make/lose any money playing the stock market with this algorithm…yet.

Tree update problem Universe = [n] at at Binary search tree Bt on [n] Loss = cost of accessing at in Bt

Tree update problem Universe = [n] Binary search tree Bt on [n]

Tree update problem Universe = [n] • Total cost = Total access cost + Total rotation cost • [Sleator-Tarjan] Splay trees are O(1)-competetive • Conjecture Rotations Binary search tree Bt+1

Tree update problem Given sequence a1, a2,…, aT • Total cost = Total access cost + Total rotation cost • Regret = Total cost – Total cost of B* = o(T) • Regret = o(cost of B*) • Static optimality Binary search tree B*

For tree update • Given query sequence a1, a2, …, aT , let OPT be cost of best tree • [KV] FTL based approach gives – Total cost = (1 + 1 / √T) OPT • Given contiguous sequence J of queries, OPTJ is cost of best tree for J • We get – Cost for J = (1 + 1 / T1/4 ) OPTJ + T3/4

ft Square Loss Loss = • Have to pay |xt - xt+1| • Get competitive ratio bounds? xt xt+1 yt

Being lazy • Do we have to update decision every round? • Could be expensive - tree update problem • We can be lazy, and only do total of m updates • But pay regret T/m • Used to get low Adaptive-Regret for tree update problem

Study your history! T f3 f2 ft f1 Room of experts FTL from f2 FTL from f3 FTL from ft FTL from f1 xt

Running time • Adaptive Regret = O(log T) • But (T) experts needed • Running time = O(RT) since we runs (T) FTLs!! FTL from f2 FTL from f3 FTL from ft FTL from f1

Removing experts • Stream through experts • We remove experts • Once removed, they are banished forever • Working set is very dynamic Working set

Working set • St = working set at time t • Subset of [t] • Properties • St+1 \ St = {t+1} • |St| = O(log t) • Well spread out t in St t

Maintaining experts • [Woodruff] Elegant deterministic construction • Rule on who to throw out from St to get St+1 • Completely combinatorial working set i t

And therefore… • We get O(log2 T) Adaptive Regret with O(log T) “copies” of original low regret algorithm • Same ideas for general convex functions • Different math though! • regret with O(log T) copies

Efficient learning algorithms for changing environments

Efficient learning algorithms for changing environments

Presentation Transcript

Space-Efficient Algorithms for Document Retrieval

Changing Urban Environments

Efficient Algorithms for Matching

EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL GENOME REARRANGEMENTS

Efficient algorithms for Steiner Tree Problem

Learning Environments

Energy-Efficient Algorithms

Efficient Solution Algorithms for Factored MDPs

Efficient Algorithms for Motif Search

Efficient Algorithms for Elliptic Curve Cryptosystems

Changing Environments

Learning Environments

Algorithms for Efficient Collaborative Filtering

Changing Rural Environments:

Managed Learning Environments for Lifelong Learning

Learning Environments

Efficient Algorithms for Motif Search

EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL GENOME REARRANGEMENTS

Changing Urban Environments

Learning Environments