1 / 37

Efficient learning algorithms for changing environments

Efficient learning algorithms for changing environments. Elad Hazan and C. Seshadhri ( IBM Almaden ). The online learning setting. G 1. G 2. G T. The online setting. f 1 (x 1 ). f 1. x 1. x 2. f 2 (x 2 ). f 2. x T. f T (x T ). Convex bounded functions

greta
Télécharger la présentation

Efficient learning algorithms for changing environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient learning algorithms for changing environments Elad Hazan and C. Seshadhri (IBM Almaden)

  2. The online learning setting G1 G2 GT

  3. The online setting f1(x1) f1 x1 x2 f2(x2) f2 xT fT(xT) • Convex bounded functions • Total loss = t ft(xt) • Adversary chooses any function from family fT

  4. Regret f2 f1 fT x* minima for t ft(x) (fixed optimum in hindsight) • Loss of our algorithm = t ft(xt) • Regret = t ft(xt) - t ft(x*) (Standard notion of performance) • Continuum of experts • Online learning problem - design efficient algorithms that attain low regret

  5. Sublinear Regret • We want • Why? • Loss per round converges to optimal • Obviously, can’t compete with best set of points

  6. Portfolio Management • [HKKA] Efficient algorithms that give O(log T) regret (Much smaller than usual O(√T) regret) Loss =

  7. f1 f2 fT x3 x2 x4 Convergence behaviour • As t increases, |xt – xt+1| decreases • As t increases, “learning” decreases…? • Does not adapt to environment x* xT x1

  8. f1 f2 fT/2 fT/2+1 fT Adapting with time • Optimal fixed portfolio is (½, ½): put equal money on both stocks • Low regret algorithms will converge to this • But this is terrible! • We want algorithm to make a switch! • Cannot happen with convergence behaviour f = (1, ½) f = (½,1)

  9. Something better than regret? • [Littlestone-Warmuth, Herbster-Warmuth, Bousquet-Warmuth] study k-shifting optima • Finite expert setting • [Freund-Schapire-Singer-Warmuth] Sleeping experts • [Lehrer, Blum-Mansour] Time selection functions

  10. Adaptive Regret x1 x2 x3 xT J f3 fT f2 f1

  11. Adaptive Regret x1 x2 x3 xT J • Max regret over all intervals • Different optimum x*J for every interval J • Captures movement of optimum as time progresses • We want Adaptive Regret = o(T) • In any interval of size (AR), algorithm converges to optimum f3 fT f2 f1 Adaptive Regret =

  12. Results • We want efficient algorithms to get low Adaptive-Regret for Portfolio Management • Normal regret can be as low as O(log T) • Can we get Adaptive-Regret close to that? • We will deal with a larger class of problems and give general results

  13. FLH • We will describe algorithm Follow-the-Leading-History • It uses standard low-regret algorithms as black box • Bootstrapping procedure - convert low regret into low adaptive regret efficiently • Done by streaming technique

  14. And now for something completely different… • For exp-concave setting (e.g. square loss, portfolio management) – [HKKA]

  15. Other work • [Auer-Cesa Bianchi-Freund-Schapire], [Zinkevich], [Y. Singer] • [Kozat-A. Singer] – independent work in DSP community • k-shifting results for portfolio management • We give more different technique

  16. Study your history! T f3 f2 ft f1 Room of experts HKKA from f2 HKKA from f3 HKKA from ft HKKA from f1 xt

  17. Who to choose? ft • Weight wi for each expert (probabilities) • Choose according to this • After ft is revealed • wi updated with a multiplicative factor, and then mix with uniform distribution HKKA from f1 HKKA from f2 HKKA from f3 HKKA from ft Multiplicative update based on Herbster Warmuth Losses of all experts

  18. Running time problem • Regret in J is O(log T) • Adaptive Regret = O(log T) • But (T) experts needed • Running time = O(RT) since we runs (T) FTLs!! FTL from f2 FTL from f3 FTL from ft FTL from f1 J

  19. Removing experts • Stream through experts • We remove experts • Once removed, they are banished forever • Working set is very dynamic Working set

  20. Working set • St = working set at time t • Subset of [t] • Properties • St+1 \ St = {t+1} • |St| = O(log t) • Well spread out t in St [Woodruff] Elegant deterministic construction: Rule on who to throw out from St to get St+1 t

  21. And therefore… • Working set always of size O(log T) • Running time for each step is only O(R log T) • We get O(log2 T) Adaptive Regret with O(log T) “copies” of original low regret algorithm

  22. To summarize • Defined Adaptive-Regret, a generalization of regret that captures “moving” solutions • Low Adaptive-Regret means we converge to fixed optimum in every interval • Gave bootstrapping algorithm that converts low regret into low Adaptive-Regret (almost optimal) • For (say) portfolio management, what is the right history to look at?

  23. Further directions • Can streaming/sublinear ideas be used for efficiency? • Applications to learning scenarios with cost of shifting • Maybe this technique can be used for online algorithms • Competitive ratio instead of regret • What kind of competitive ratio can these learning techniques give?

  24. Thanks! No, we didn’t make/lose any money playing the stock market with this algorithm…yet.

  25. Tree update problem Universe = [n] at at Binary search tree Bt on [n] Loss = cost of accessing at in Bt

  26. Tree update problem Universe = [n] Binary search tree Bt on [n]

  27. Tree update problem Universe = [n] • Total cost = Total access cost + Total rotation cost • [Sleator-Tarjan] Splay trees are O(1)-competetive • Conjecture Rotations Binary search tree Bt+1

  28. Tree update problem Given sequence a1, a2,…, aT • Total cost = Total access cost + Total rotation cost • Regret = Total cost – Total cost of B* = o(T) • Regret = o(cost of B*) • Static optimality Binary search tree B*

  29. For tree update • Given query sequence a1, a2, …, aT , let OPT be cost of best tree • [KV] FTL based approach gives – Total cost = (1 + 1 / √T) OPT • Given contiguous sequence J of queries, OPTJ is cost of best tree for J • We get – Cost for J = (1 + 1 / T1/4 ) OPTJ + T3/4

  30. ft Square Loss Loss = • Have to pay |xt - xt+1| • Get competitive ratio bounds? xt xt+1 yt

  31. Being lazy • Do we have to update decision every round? • Could be expensive - tree update problem • We can be lazy, and only do total of m updates • But pay regret T/m • Used to get low Adaptive-Regret for tree update problem

  32. Study your history! T f3 f2 ft f1 Room of experts FTL from f2 FTL from f3 FTL from ft FTL from f1 xt

  33. Running time • Adaptive Regret = O(log T) • But (T) experts needed • Running time = O(RT) since we runs (T) FTLs!! FTL from f2 FTL from f3 FTL from ft FTL from f1

  34. Removing experts • Stream through experts • We remove experts • Once removed, they are banished forever • Working set is very dynamic Working set

  35. Working set • St = working set at time t • Subset of [t] • Properties • St+1 \ St = {t+1} • |St| = O(log t) • Well spread out t in St t

  36. Maintaining experts • [Woodruff] Elegant deterministic construction • Rule on who to throw out from St to get St+1 • Completely combinatorial working set i t

  37. And therefore… • We get O(log2 T) Adaptive Regret with O(log T) “copies” of original low regret algorithm • Same ideas for general convex functions • Different math though! • regret with O(log T) copies

More Related