210 likes | 362 Vues
No-Regret Algorithms for Online Convex Programs. Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007. Outline. Online learning setting Definition of Regret Safe Set Lagrangian Hedging (gradient form) Lagrangian Hedging (optimization form)
 
                
                E N D
No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007
Outline • Online learning setting • Definition of Regret • Safe Set • Lagrangian Hedging (gradient form) • Lagrangian Hedging (optimization form) • Mention of Theoretical Results • Application: One-Card Poker
Online Learning • Sequence of trials 1, 2, … • At each trial we must pick a hypothesis yi • Correct answer revealed in the form of a convex loss function lt(yt) • Just before seeing t-th example, total loss is given by
Goal of Paper • Introduce Lagrangian Hedging algorithm • Generalization of other algorithms • Hedge (Freund and Schapire) • Weighted Majority (Littlestone and Warmuth) • External-regret Matching (Hart and Mas-Colell) • (CMU Technical Report is much clearer than NIPS paper)
Regret • If we had used a fixed hypothesisy, the loss would have been • The regret is the difference between the total loss of the adaptive and fixed hypotheses: • Positive regret means that we should have preferred the fxed hypothesis
Hypothesis Set • Assume that hypothesis set Y is a convex subset of Rd • For example, the simplex of probability distributions • The corners of Y represent pure actions and the middle region a probability distribution over actions
Loss Function • Minimize a linear loss
Regret Vector • Keep the state of the learning algorithm • Vector that keeps information about actual losses and gradient of loss function • Define regret vector st by the recursion • Arbitrary vector u which satisfiesfor all • Example: if y is a probability, then u can be the vector of all ones.
Use of Regret Vector • Given any hypothesis y, we can use the regret vector to compute its regret:
Safe Set • Region of the regret space in which the regret is guaranteed to be nonpositive for all hypotheses • Goal of the Lagrangian Hedging algorithm is to keep its regret vector « near » the safe set
Safe Set (continued) Hypothesis set Y Safe Set S
Unnormalized Hypotheses • Consider the cone of unnormalized hypotheses: • The safe set is a cone that is polar to this cone of unnormalized hypotheses:
Lagrangian Hedging (Setting) • At each step, the algorithm chooses its play according to the current regret vector and a closed convex potential function F(s) • Define (sub)gradient of F(s) as f(s) • Potential function is what defines the problem to be solved • E.g. Hedge / Weighted Majority:
Optimization Form • In practice, may be difficult to define, evaluate and differentiate an appropriate potential function • Optimization form: same pseudo-code as previously, but define F in terms of a simpler hedging functionW • Example corresponding to previous F1
Optimization Form (cont’d) • Then may obtain F as: • And the (sub)gradient as: • Which we may plug into the previous pseudo-code
One-Card Poker • Hypothesis space is the set of sequence weight vectors • information about when it is player i’s turn to move and the actions available at that time • Two players: gambler and dealer • Ante = $1 / given 1 card from 13-card deck • Gambler Bets / Dealer Bets / Gambler Bets • A player may fold • If neither folds: player with highest card wins pot
Why is it interesting? • Elements of more complicated games: • Incomplete information • Chance events • Multiple stages • Optimal play requires randomization and bluffing