Bayesian and non-Bayesian Learning in Games

Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann Smorodinsky, Eioln Solan.

Learning in Games Informal definition of learning: a decentralized process that converges (in some sense) to (some) equilibrium. • Bayesian (rational) learning: Players do not start in equilibrium, but • they have some initial belief about other players’ strategies • they are rational: they maximize their payoffs • they take into account future payoffs • Convergence in REPEATED GAME • Non-Bayesian learning: Players • don’t have any initial belief about other players’ strategies • don’t maximize their payoffs • don’t take into account future payoffs • Convergence (of the empirical frequency) to an equilibrium of the ONE-SHOT GAME

Bayesian vs. non-Bayesian Bayesian learning: Players do not start in equilibrium, but they start with a “grain” of idea about what other players do. Nature of results: players eventually play something close to an equilibrium of the repeated game. Non-Bayesian learning: Players have no idea about other players’ actions. They don’t care to maximize payoffs. Nature of results: the statistics of past actions looks like an equilibrium of the one-shot game.

Important tools • Bayesian learning: merging of two probability measures along a a filtration (an increasing sequence of - fields) Non-Bayesian learning: approachability Both were initiated by Blackwell (the first with Dubins)

A set F is excludable by player 2 if there is a strategy  s.t. Repeated Games with Vector Payoffs • I = finite set of actions of player 1. • J = finite set of actions of player 2. • M = (mi,j) = a payoff matrix. Entries are vectors in Rd. A set F is approachable by player 1 if there is a strategy  s.t. There are sets which are neither approachable nor excludable.

Approachability • Applications (a sample): • No-regret (Hannan) • Repeated games with incomplete information (Aumann-Maschler) • Learning (Foster-Vohra, Hart-Mas Colell) • Manipulation of calibration tests (Foster-Vohra, Lehrer, • Smorodinsky-Sandroni-Vohra) • Generating generalized normal-number (Lehrer)

Characterization of Approachable Sets H(p0) x y the line xy F the hyperplane perpendicular to xy that passes through y mp,q = i,j pi mi,j qj H(p) = { mp,q , q (I) } • A closed set F Rd is a B-set if for every xFthere is y F that satisfies: • y is a closest point in F to x. • The hyperplane perpendicular to the line xy that passes through y separates between x and H(p), for some p  (I).

Characterization of Approachable Sets Theorem [Blackwell, 1956]: every B-set F is approachable. The approaching strategy plays at each stage n the mixed action p such that H(p) and x are separated by the hyperplane connecting xand a closest point to x in F. With this strategy: Theorem [Blackwell, 1956]: every convex set is either approachable or excludable. Theorem [Hou, 1971; Spinat, 2002]: every minimal (w.r.t. set inclusion) approachable set is a B-set. Or: A set is approachable if and only if it contains a B-set.

Bounded Computational Capacity A strategy is k-bounded-recall if it depends only on the last k pairs of actions (and it does not depend on previously played actions). • A (non-deterministic) automaton is given by: • A finite state space. • A probability distribution over states, according to which the initial state is chosen. • A set of inputs (say, the set I× J of action pairs). • A set of outputs (say, I , the set of player 1’s actions). • A rule that assigns to each state a probability distribution over outputs. • A transition rule that assigns to every state and every input a probability distribution over the next state.

Approachability and Bounded Capacity A set F is approachable with bounded-recall strategies by player 1 if for every >0, the set B(F, ) := { y : d(y, F)  } is approachable by some bounded-recall strategy. A set F is excludable against bounded-recall strategies by player 2 if player 2 has a strategy  such that • Theorem (w/ Eilon Solan): The following statements are equivalent. • The set F is approachable with bounded-recall strategies. • The set F is approachable with automata. • The set Fcontains a convex approachable set. • The set F is not excludable against bounded-recall strategies. 4 points to note

Main Theorem • Theorem: The following statements are equivalent for closed sets. • The set F is approachable with bounded-recall strategies. • The set F is approachable with automata. • The set F contains a convex approachable set. • The set F is not excludable against bounded-recall strategies. • A set is approachable with automata if and only if it is approachable by bounded-recall strategies. 2. A complete characterization of sets that are approachable with bounded-recall strategies. 3. A set which is not approachable with bounded-recall strategies, is excludable against all bounded-recall strategies. 4. We do not know whether the same holds for automata.

Example On board Good news: in applications target sets are convex ( a point or a whole -- positive or negative -- orthant).

Approachability in Hilbert space • I = finite set of actions of player 1. • J = finite set of actions of player 2. • M = (mi,j) = a payoff matrix. Entries are points in HS • (random variables). • All may change with the stage n. A set F is approachable by player 1 if there is a strategy  s.t. Advantage: allows for infinitely many constraints Theorem: Suppose that at stage n, the average payoff is and y is a closest point in F to . If the hyperplane perpendicular to the line that passes through y separates between and H(p), for some p  (I), then F is approachable.

Approachability and law of large numbers are uncorrelated r.v.’s with . is the dot product. F is At any stage n, .

F The game: each players has only one action. The payoff at stage n is . Thus, F is approachable. This is the strong law of large numbers. (When the payoffs are not uniformly bounded, there is an additional boundedness condition.)

Problem: Approachability in norm spaces.

Activeness function H is (even over a finite probability space). At stage n the characteristic function indicates which coordinates are active and which are not. The average payoff at stage n is Applications: 1. repeated games with incomplete information – different games are active on different times 2. construction of normal numbers 3. manipulability of many calibration tests 4. general no-regret theorem (against many replacing schemes) 5. convergence to correlated eq. along many sequences

Activeness function – cont. Theorem: suppose that F is convex. Let be the closest point in F to the average payoff at time n, . If the hyperplane perpendicular to the line that passes through separates between and H(p), for some p  (I), then F is approachable.

Bayesian and non-Bayesian Learning in Games

Bayesian and non-Bayesian Learning in Games

Presentation Transcript

Bayesian Games

Bayesian Learning

Bayesian Games

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Games

Bayesian Learning and Learning Bayesian Networks

Bayesian Learning

Bayesian Learning

Bayesian Learning

Advances in Bayesian Learning Learning and Inference in Bayesian Networks

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian learning

Bayesian Games

Bayesian Learning