CPSC 7373: Artificial Intelligence Lecture 15: Game Theory

CPSC 7373: Artificial IntelligenceLecture 15: Game Theory Jiang Bian, Fall 2012 University of Arkansas at Little Rock

Game Theory • Game theory is a study of strategic decision making. • A study of mathematical models of conflict and cooperation between intelligent rational agents (decision-makers). • In the last lecture, we studied • Models of games, and how to use search tree for solving states’ utility functions. • Turn-based game • In this lecture, we will discuss simultaneous actions (cooperation): • Making decisions without knowing the opponent's move – Partially observable environment • Two goals of this lecture: • Agent design: given the game, find the optimal policy depend on the opponent's policy (may not be directly observable); • Mechanism design (game design): given utility functions, how can we design a mechanism, so that when the agents act rationally, to maximize the global utility.

Prisoner’s Dilemma • Two criminals: Alice and Bob • Caught together at the scene of a crime • The police offers each independently a deal saying “If you testify against your partner, you will get reduced jail time” • Non-zero-sum game • They are both perfectly rational • What strategy should Alice and Bob adopt?

Dominant Strategy • Dominant Strategy: is a strategy where the player does better than any other strategies no matter what the other player does. • Does either Alice or Bob have a dominant strategy? • Testify is the dominant strategy • For Alice: testify is better (-5 > -10) than refuse, vice verse for Bob • i.e., the choice of Alice does not change the way Bob will act, and vice versa

Pareto Optimal Outcome • Pareto Optimal outcome is a measure of efficiency. • is one such that no-one could be made better off without making someone else worse off. • i.e., An outcome (of the game) is said to be Pareto efficient if there is no other outcome in which some other individual is better off and no individual is worst off. • The Pareto optimal outcomes in this case are: • A = -10; B = 0 and A = 0; B = -10; and A = -1; B = -1

Equilibrium • A Nash equilibrium of a game is a strategy combination such that no party can improve its situation by changing its strategy, assuming the complementary strategies of the other players stay the same. • John Nash: “A beautiful Mind” 2001 movie showing that every game has at least one equilibrium point. • the Nash equilibrium is a solution concept of a non-cooperative game involving two or more players • The simple insight underlying John Nash's idea is that we cannot predict the result of the choices of multiple decision makers if we analyze those decisions in isolation. Instead, we must ask what each player would do, taking into account the decision-making of the others.

Dilemma • The prisoner’s game turns out to be a dilemma because: • there's an equilibrium point that it seems like if both players are rational, they're bound to end up in testify; whereas • the best Pareto optimal solution is that both should refuse to testify. • However, being rational, neither Alice nor Bob can see a way to get to the preferred outcome.

Game Console Game • Two game manufacturers: A and B are deciding whether to use Blu-ray or DVD in their next game console. • Questions: • Is there a DOMINANT STRATEGY for • A, B, None, or Both ??? • Is there a EQUILIBRIUM outcome? • Is there any Pareto optimal outcomes?

Game Console Game • Two game manufacturers: A and B are deciding whether to use Blu-ray or DVD in their next game console. • Questions: • Is there a DOMINANT STRATEGY for • A, B, None, or Both ??? • Because for each player the best strategy depends on what the other player’s strategy. The best strategy is when they match. • Is there a EQUILIBRIUM outcome? • A = +9; B = +9 and A = +5; B = +5; • Because no player can improve its outcome by just changing their strategy • Is there any Pareto optimal outcomes? • A = +9; B = +9

Two Finger Morra • Even and Odd both simultaneously show 1 or 2 fingers. • If the total number of fingers is: • even: Even gains that many points from Odd • odd: Odd gains that many points from Even • There is no dominant strategy • Mixed strategy: is a strategy where the action is a probability distribution. • Pure strategy: a single strategy of always playing one or the other

Two Finger Morra Solution Tree Even: Maximize Odd: Minimize Assuming Even goes first: Even -3 This gives Even disadvantage by revealing his strategy first… 1 2 -3 -3 Odd 1 2 2 1 4 2 -3 -3

Two Finger Morra Solution Tree Even: Maximize Odd: Minimize Assuming Odd goes first: Assuming Even goes first: 2 Even Even The true game goes simultaneously!!! -3 1 1 2 2 -3 <= UE <= 2 -3 -3 Odd Odd 2 4 1 2 2 1 2 2 1 1 4 2 4 -3 2 -3 -3 -3

Mixed Strategy • Mixed strategy: the action of a player is a probability distribution of his/her possible moved. Odd:{q: one; (1-q): two} Even:{p: one; (1-p): two} -3 -3 q: 1 p: 1 (1-q): 2 (1-p): 2 -3 -3 Even Odd 2 2 1 1 2q - 3(1-q) -3q + 4(1-q) 2p - 3(1-p) -3p + 4(1-p)

Mixed Strategy: Solution • What value Even should choose p? • When: 2p - 3(1-p) = -3p + 4(1-p) • p = 7/12 • Let’s plug p back to the outcome table, we get: • UE = -1/12 Even:{p: one; (1-p): two} -3 p: 1 (1-p): 2 -3 Odd 2 1 2p - 3(1-p) -3p + 4(1-p)

Mixed Strategy: Solution • Even first: p = 7/12; UE = -1/12 • Odd first: q = 7/12; UE= -1/12 Odd:{q: one; (1-q): two} Even:{p: one; (1-p): two} -3 -3 q: 1 p: 1 -1/12 <= UE <= -1/12 (1-q): 2 (1-p): 2 UE = -1/12 -3 -3 Even Odd 2 2 1 1 2q - 3(1-q) -3q + 4(1-q) 2p - 3(1-p) -3p + 4(1-p)

Mixed Strategy Issues • The mixed strategy brings us some curious philosophical problems related to the idea of: • Randomness: action is a probability distribution • Secrecy: • The action itself needs to be a secret (otherwise the opponent can react and make a better choice accordingly). • However, the strategy itself is not a secret, since the opponent can compute our rational strategy as well. • Rationality: a rational agent is one that does the right thing. However, • You can perform better, if your opponent can believes you are NOT rational. • e.g., one action that is available to a nation’s leader is to go to war, and both sides understand that the strategy of going to war is dominated by other strategies. So, • The action of going to war is irrational. • A threat of “Give me this concession, or I’ll go to war against you” is not a creditable threat. • However, if a leader can convince the opponent that he is irrational or crazy, then the threat suddenly become credible. • Being irrational does not help, but appearing irrational can gain you advantages.

Mixed Strategy: E1 • What is p and q? • What is the utility of Max (Umax) ?

Mixed Strategy: E1 - Solution • What is p and q? • p = 1; q = 0; • Both players have dominant strategies; • For Max: 1 is always better than 2 • For Min: 2 is always better than 1 • What is the utility of Max (Umax) ? • Umax= 3

Mixed Strategy: E2 • What is p and q? • What is the utility of Max (Umax) ?

Mixed Strategy: E2 - Solution • What is p and q? • p = ¼ ; q = ½; • What is the utility of Max (Umax) ? • Umax= 3 * ¼ * ½ + 6 * ¼ * ½ + 5 * ¾ * ½ + 4 * ¾ * ½ • = 4.5 Max: 3p + 5(1-p) = 6p + 4 (1-p) Min: 3q + 6 (1-q) = 5q + 4 (1-p)

Geometric Interpretation • Two finger morra game: • Even: is trying to maximize the U; • Odd: is trying to minimize the U

Poker Game • Deck: KKAA • Deal: 1 card each Rounds: • (1) raise/check • (2) call/fold Sequential game; extensive form: keep track of the belief states of the possibilities of what each agent knows and doesn’t know.

Game Theory Strategies • For a real poker game, we have to track 10^18 states. • How do we reduce the number of states? • The best way is to abstract: take similar states and treat them as one. e.g., In the poker game, we can • Rather than saying that 2, 3, 4, and 5 are different values, if I have a pair of 10s, then I can think the other players’ cards as being == 10, <= 10, or >= 10 • We can lump bets together into small, medium, and large. • Game theory can handle: uncertainty, partial observability, multiple agents, stochastic, sequential, dynamics. But, game theory can’t handle: • Unknown actions (i.e., we have to know all actions up front) • Continuous actions (i.e., matrix like form) • Irrational opponents • Unknown utilities (i.e., we have to know what we are trying to optimize)

Fed vs Politicians • Agame played between the Federal Reserve board and politicians. • The politicians have a choice whether they want to contract fiscal policy (-), expand it (+), or do nothing (0), and the Fed has the same three choices. • The outcomes are ranked: for each party from 1 being the worst outcome to 9 being the best outcome. • Find the equilibrium point for this game??? • There will be one equilibrium point. • The equilibrium point defines a pure strategy for each player. • Pareto Optimal??? Yes/No • UF = ??? and UP = ???

Fed vs Politicians • Find the equilibrium point for this game??? • F = 3, P = 3 (Pol: +; Fed: -) • Pareto Optimal??? No • UF = 3 and UP = 3

Mechanism Design • Mechanism design (Game design): • We want to design the rules of the game such that we get a high outcome or a high expected utility for the people that run the game, for the players who play the game. • For example, designing auction sites like eBay, where the goal is to come up with the rules of the auction that will make it attractive to bidders and/or people who want to respond to the ads, and make a good result for all. • You could attract more if it's less work for them, so that you should design the rules where the dominant strategies exists. • Strategyproof: an asymmetric game where players have private information is said to be strategyproof (truth revealing or incentive compatible) if there is no incentive for any of the players to lie about or hide their private information for the other players. e.g.: voting system, auctions, etc.

Second-Price Auction Your earning: If b>c: v – c else: 0 Value = v Your Bid = b Highest other bid = c v = 10 winner cost Is there a dominant strategy??? Which one??? Is it a truth-revealing? Yes/No

Second-Price Auction Your earning: If b>c: v – c else: 0 Value = v Your Bid = b Highest other bid = c v = 10 v = 10 winner cost Is there a dominant strategy??? Which one??? B=10 (weak dominant) Is it a truth-revealing? Yes

CPSC 7373: Artificial Intelligence Lecture 15: Game Theory

CPSC 7373: Artificial Intelligence Lecture 15: Game Theory

Presentation Transcript

CPSC 7373: Artificial Intelligence Lecture 6: Machine Learning

CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing

CPSC 7373: Artificial Intelligence

Artificial Intelligence CPSC 327

CPSC 7373: Artificial Intelligence

CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty

CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning

Artificial Intelligence Lecture No. 15

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference

CPSC 7373: Artificial Intelligence

CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters

CPSC 7373: Artificial Intelligence

CPSC 322 Introduction to Artificial Intelligence

CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty

CPSC 322 Introduction to Artificial Intelligence

Artificial Intelligence Lecture

CPSC 7373: Artificial Intelligence Lecture 7: Unsupervised Learning

CPSC 7373: Artificial Intelligence Lecture 6: Machine Learning

CPSC 7373: Artificial Intelligence Lecture 9: Planning

Theory of Intelligence, Artificial Intelligence

CPSC 322 Introduction to Artificial Intelligence