Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization

Algorithmic Game TheoryUri Feige Robi Krauthgamer Moni NaorLecture 8: Regret Minimization Lecturer:Moni Naor

Announcements • Next Week (Dec 24th): Israeli Seminar on Computational Game Theory (10:00 - 4:30) • At Microsoft Israel R&D Center, 13 Shenkar St. Herzliya. • January: course will be 1300:-15:00 • The meetings on Jan 7th, 14th and 21st 2009

The Peanuts Game • There are nbins. • At each round ``nature” throws a peanut into one of the bins • If you (the player) are at the chosen bin – you catch the peanut • Otherwise – you miss it • You may choose to move to any bin before any round • Game ends when dpeanuts were thrown at some bin • Independent of whether they were caught or not Goal: to catch as many peanuts as possible. Hopeless if the opponent is tossing the peanuts based on knowing where you stand. Say sequence of bins is predetermined (but unknown). How well can we do as a function of d and n?

Basic setting • View learning as a sequence of trials. • In each trial, algorithm is given x, asked to predict f, and then is told the correct value. • Make no assumptions about how examples are chosen. • Goal is to minimize number of mistakes. Focus on number of mistakes. Need to “learn from our mistakes”.

Using “expert” advice Want to predict the stock market. • We solicit n “experts” for their advice: will the market go up or down tomorrow? • Want to use their advice to make our prediction. E.g., • What is a good strategy for using their opinions, • No a priori knowledge which expert is the best? • “Expert”: someone with an opinion.

Some expert is perfect • We have n “experts”. • One of these is perfect (never makes a mistake). We just don’t know which one. • Can we find a strategy that makes no more than log n mistakes? Simple algorithm: take majority vote over all experts that have been completely correct so far. • What if we have a “prior” pover the experts. Want no more than log(1/pi) mistakes, where expert i is the perfect one? • Take weighted vote according top.

Relation to concept learning • If computation time is no object, can have one “expert” per concept in C. • If target in C, then number of mistakes at most log|C|. • More generally, for any description language, number of mistakes is at most number of bits to write down f.

What if no expert is perfect? Goal is to do nearly as well as the best one in hindsight. Strategy #1: • Iterated halving algorithm. Same as before, but once we've crossed off all the experts, restart from the beginning. • Makes at most log(n)*OPT mistakes, • OPT is # mistakes of the best expert in hindsight. Seems wasteful: constantly forgetting what we've “learned”. x log n log n log n log n

Weighted Majority Algorithm Intuition: Making a mistake doesn't completely disqualify an expert. So, instead of crossing off, just lower its weight. Weighted Majority Alg: • Start with all experts having weight 1. • Predict based on weighted majority vote. • Penalize mistakes by cutting weight in half.

Weighted Majority Algorithm Weighted Majority Alg: • Start with all experts having weight 1. • Predict based on weighted majority vote. • Penalize mistakes by cutting weight in half. Example: Day 1 Day 2 Day 3 Day 4

Analysis: not far from best expert in hindsight • M = # mistakes we've made so far. • b = # mistakes best expert has made so far. • W = total weight. Initially W is set to n. • After each mistake: W drops by at least 25%. So, after M mistakes, W is at most n(3/4)M. • Weight of best expert is (1/2)b. So: (1/2)b· n(3/4)M and M ·(b+log n) constant comp. ratio

Randomized Weighted Majority If the best expert makes a mistake 20% of the time, then (b+log n) not so good: Can we do better? Instead of majority vote: use weights as probabilities. If 70% on up, 30% on down, then pick 70:30 Idea: smooth out the worst case. • Also, generalize ½ to 1- e.

Randomized Weighted Majority /* Initialization */: Wi← 1 for i 2 1..n /* Main loop */: For t=1.. T Let Pt(i) = Wi/(j=1n Wj) . Choose i according to Pt /*Update Scores */ Observe losses for i 2 1..n Wi← Wi ¢ (1-)loss(i)

Analysis • Say at time t fraction Ft of weight on experts that made mistake. • We have probability Ft of making a mistake: remove an eFt fraction of the total weight. • Wfinal = n(1-e F1)(1 - e F2)... • ln(Wfinal) = ln(n) +t [ln(1 -eFt)] · ln(n) -etFt (usingln(1-x) < -x) = ln(n) -eM. (åFt = E[# mistakes]) • If best expert makes b mistakes, then ln(Wfinal) > ln((1-)b). • Now solve: ln(n) -eM > bln(1-e): M · bln(1-e)/(-e) + ln(n)/e · b (1+e) + ln(n)/e . • M = Expected # mistakes • b = # mistakes best expert made • W = total weight. -ln(1-x) · -x+x2 for 0 · x · 1/2

Summarizing: RWM • Can be (1+)-competitive with best expert in hindsight, with additive -1log(n). • If running T time steps, set = (ln n/T)1/2 to get M · b(1+ (ln n/T)1/2)+ln(n)/ (ln n/T)1/2 =b+ (b2ln n/T)1/2 )+(ln(n)T)1/2 · b+ 2(ln(n)T)1/2 additive loss • M = # mistakes made • b = # mistakes best expert made • M · b (1+e) + ln(n)/e

Questions • Isn’t it better to sometimes take the majority? • The best expert may have a hard time on the easy question and we would be better of using the wisdom of the crowds • Answer: if it a good idea, make the majority expert one of the experts!

Lower Bounds Cannot hope to do better than: • log n • T1/2

What can we use this for? • Can use to combine multiple algorithms to do nearly as well as best in hindsight. • E.g., online auctions: one expert per price level. • Play repeated game to do nearly as well as best strategy in hindsight Regret Minimization • Extensions: “bandit problem”, movement costs.

Adversary – world - life Algorithm “No-regret” algorithms for repeated games Repeated play of matrix game with N rows. (Algorithm is row-player, rows represent different possible actions). • At each step t: algorithm picks row; life picks column. • Alg pays cost for action chosen. Ct(it) • Alg gets column as feedback : Ct • or just its own cost in the “bandit” model). • Assume bound on max cost: all costs between 0 and 1. it Ct

“No-regret” algorithms for repeated games • At each time step, algorithm picks row, life picks column. • Alg pays cost for action chosen: Ct(i) • Alg gets column as feedback • Assume bound on max cost: all costs between 0 and 1. Define average regret in T time steps as: [avg cost of alg] – [avg cost of best fixed row in hindsight] 1/T ¢ [t=1T Ct(it) - minit=1T Ct(i)] Want this to go to0or better as T gets large [= “no-regret” algorithm].

Adversary – world - life Algorithm Randomized Weighted Majority /* Initialization */: Wi← 1 for i 2 1..n /* Main loop */: For t=1.. T Let Pt(i) = Wi/(j=1n Wj) . Choose i according to Pt /*Update Scores */ Observe column Ct for i 2 1..n Wi← Wi ¢ (1-)Ct(i) i Ct

Analysis • Similar to {0,1} case. E[cost of RWM] · (minit=1T Ct(i) )/(1-e) +ln(n)/e · (minit=1T Ct(i) (1+2e)+ln(n)/e No Regret: as T grows difference goes to 0 Fore · ½

Adversary – world - life Algorithm Properties of no-regret algorithms. • Time-average performance guaranteed to approach minimax value V of game • or better, if life isn’t adversarial. • Two NR algorithms playing against each other: have empirical distribution approaching minimax optimal. • Existence of no-regret algorithms: yields proof of minimax theorem.

von Neuman’s Minimax Theorem • Zero-sum game: u2(a1,a2) = -u1(a1,a2) Theorem: • For any two-player zero sum game with finite strategy set A1, A2 there is a value v 2R, the game value, s.t. v = maxp 2(A1) minq 2(A2) u1(p,q) = minq 2(A2) maxp 2(A1) u1(p,q) • For all mixed Nash equilibria (p,q):u1(p,q)=v (A) = mixed strategies over A

Convergence to Minimax • Suppose we know v = maxp 2(A1) minq 2(A2) u1(p,q) = minq 2(A2) maxp 2(A1) u1(p,q) • Consider distribution q for player 2: observed frequencies of player 2 for T steps • There is best response x 2 A1 for q so that u2(x,q) · v • If player 1 always plays x: then expected gain is vT • If player 1 follows a no-regret procedure: loss is at most vT +R where R/T → 0 Using RWM: average loss is v + O(log n/T)1/2)

Proof of the Minimax • Want to show v = maxp 2(A1) minq 2(A2) u1(p,q) = minq 2(A2) maxp 2(A1) u1(p,q) Consider for player 1 • v1max = maxx 2 A1minq 2(A2) u1(x,q) • v1min = miny 2 A2maxp 2(A1) u1(p,y) For player 2 • v2max = maxy 2 A2minp 2(A2) -u1(p,y) • v2min = minx 2 A1maxq 2(A1) -u1(x,q) • Need to prove v1max = v1min . Easy: v1max¸ v1min • Suppose v1max = v1min + for ¸ 0 • Player 1 and 2 follow a no-regret procedure for T steps with regret R • Need R/T </2 Best choice given player 2 distribution Best distribution not given player 2 distribution

Proof of the Minimax Consider for player 1 • v1max = maxx 2 A1minq 2(A2) u1(x,q) • v1min = miny 2 A2maxp 2(A1) u1(p,y) For player 2 • v2max = maxy 2 A2minp 2(A2) -u1(p,y) • v2min = minx 2 A1maxq 2(A1) -u1(x,q) Suppose v1max = v1min + for ¸ 0 • Player 1 and 2 follow a no-regret procedure for T steps with regret R • Need R/T </2: Losses are LTand -LT. Let L1and L2be best response losses for the empirical distributions Then L1 /T ¸ v1maxand L2 /T ¸ v2max But L1·LT +R and L2¸- LT - R

History and development • [Hannan’57, Blackwell’56]: Alg. with regretO((N/T)1/2). • Need T = O(N/2) steps to get time-average regret . • Call this quantity T • Optimal dependence on T(or ).View N as constant • Learning-theory 80s-90s: “combining expert advice” • Perform (nearly) as well as best f2C. View N as large. [LittlestoneWarmuth’89]: Weighted-majority algorithm E[cost] · OPT(1+e) + (log N)/e·OPT+T+(log N)/ • Regret O((log N)/T)1/2. • T = O((log N)/2).

Why Regret Minimization? • Finding Nash equilibria can be computationally difficult • Not clear that agents would converge to it, or remain in one if there are several • Regret minimization is realistic: • There are efficient algorithms that minimize regret • It is locally computed, • Players improve by lowering regret

target Efficient implicit implementation for large n… • Bounds have only log dependence on n. • So, conceivably can do well when n is exponential in natural problem size, if only could implement efficiently. • E.g., case of paths… • Recent years: series of results giving efficient implementation/alternatives in various settings

The Evesdropping Game • Let G=(V,E) • Player 1 chooses and edge e of E • Player 2 chooses a spanning tree T • Payoff u1(e,T) = 1 if e 2 T and 0 otherwise • The number of moves: exponential in |G| • But: best response for Player 2 given distribution on edges: • solve a minimum spanning tree on the probabilities

Correlated Equilibrium and Swap Regret • What about Nash? • For correlated equilibrium: if algorithm has low swap regret then converges to correlated equilibrium.

Back to the Peanuts Game • There are nbins. • At each round ``nature” throws a peanut into one of the bins • If you (the player) are at the chosen bin – you catch the peanut • Otherwise – you miss it • You may choose to move to any bin before any round • Game ends when dpeanuts were thrown at some bin Homework: what guarantees can you provide

Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization