Unit III: The Evolution of Cooperation

Unit III: The Evolution of Cooperation • Can Selfishness Save the Environment? • Repeated Games: the Folk Theorem • Evolutionary Games • A Tournament • How to Promote Cooperation 3/30

Can Selfishness Save the Environment? • The Problem of Cooperation • The Tragedy of the Global Commons? • Common Resource Game • We Play a Game • Repeated Games • Discounting • The Folk Theorem

The Problem of Cooperation How can a number of individuals, each behaving as a utility maximizer, come to behave as a group and maximize joint utility?

The Problem of Cooperation • Assurance Game Prisoner’s Dilemma Players may fail to cooperate (i.e., fail to maximize joint payoffs), because they lack information. If each has reason to believe the other will cooperate, the problem is solved! C D C6,6 0,5 D5,0 1,1

The Problem of Cooperation • Assurance Game Prisoner’s Dilemma C D C D C3,3 0,5 C6,6 0,5 D5,0 1,1 D5,0 1,1

The Problem of Cooperation • Prisoner’s Dilemma In the Prisoner’s Dilemma, there is no belief that will lead the players to cooperate. Rather than a problem of information, this is a problem of incentives. Cooperation is both inaccessible and unstable. • Easy C D C D C3,3 0,5 C6,6 0,5 D5,0 1,1 D5,0 1,1

Problem of Cooperation The problem of cooperation arises in several important contexts, including public goods:everyone can enjoy the good even if they don’t pay for it, e.g., nat’l defense, public tv. common (property) resources: raw ornatural resources that are own by everyone (or no one), e.g., clean air, clean water, biodiversity. Can Selfishness Save the Environment? • subject to the Free-rider problem • undersupplied by a voluntary contribution scheme • subject to the “Tragedy of the Commons”(Hardin, 1968) • overconsumed (depleted)

Tragedy of the Global Commons? Can Selfishness Save the Environment? Arguments to the effect that “polluting is wrong” are less likely to be effective than measures that get the incentives right over the long run (Ridley & Low, 1993). Our Common Future: Transboundary pollution, ozone depletion, nuclear proliferation, global warming, loss of biodiversity, deforestation, overfishing are all the consequences of continuing economic growth and development (Brundtland, 1987). Negative externalities

Tragedy of the Global Commons? Consider a country deciding on its optimal level of economic growth (X), in the presence of a negative externality (e.g., transboundary pollution, etc.). National Utility is a positive function of own growth and negative function of overall growth (X,X’): National Utility Own Choice of X All Choices of X P(X,X’) = a(X) – b(X,X’) + c Alternatively: X can be the voluntary contribution level in the case of a public good (bad); or the consumption level in the case of a common resource.

Common Resource Game Two fishermen fish from a single lake. Each year, there are a fixed number of fish in the lake and two periods during the year that they can be harvested, spring and fall. Each fisherman consumes all the fish he catches each period, and their identical preferences are described by the following consumption function: Ui = CsCf where Cs = spring catch; Cf = fall catch. Each spring, each fisherman decides how many fish to remove from the lake. In the fall, the remaining fish are equally divided between the two.

Common Resource Game Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond. • Period 1 each fishery chooses to consume (c1, c2). • Period 2 remaining fish are equally divided (Y – (c1+c2))/2). c1 = (Y – c2)/2 Ui = ct ct+1 Ct = Today’s consumption Ct+1 = Tomorrow’s ‘’ c2 Y/3 c2 = (Y – c1)/2 Y/3 c1

Common Resource Game Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond. • Period 1 each fishery chooses to consume (c1, c2). • Period 2 remaining fish are equally divided (Y – (c1+c2))/2). c1 = (Y – c2)/2 Social Optimality: c1 = c2 = Y/4 c2 Y/3 Y/4 c2 = (Y – c1)/2 Y/4Y/3 c1

Common Resource Game Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond. • Period 1 each fishery chooses to consume (c1, c2). • Period 2 remaining fish are equally divided (Y – (c1+c2))/2). c1 = (Y – c2)/2 If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring. c2 Y/3 Y/4 c2 = (Y – c1)/2 Y/4Y/3 c1

Common Resource Game If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring. C D C = 3 in the spring D = 4 ““ C9, 9 7.5,10 A Prisoner’s Dilemma What would happen if the game were repeated? D 10,7.5 8, 8

We Play a Game At each round of the game, you will have the chance to contribute to a public good (e.g., national defense; public tv). The game is repeated for several rounds, and payoffs are calculated as follows: 1 pt. for each contribution made by anyone. + 3 pts. for each round you don’t contribute. See Holt and Laury, JEP 1997: 209-215.

We Play a Game Payoff: 1 pt. for each contribution made by anyone. + 3 pts. for each round you don’t contribute. Payoffs = 1 pt. for each contribution by anyone; 10 pts. for each round you don’. You play: Contribution Rate (n-1) 0% 10 … 50 … 90 100% Contribute 1 4 16 28 30 Don’t 3 6 18 30 33 Assume n = 30 n-person Prisoner’s Dilemma: Don’t Contribute is a dominant strategy. But if none Contribute, the outcome is inefficient.

We Play a Game Public Goods Games Round No. of Round No. of Contributions Contributions Data from 2009. N = 20. Communication was allowed between rounds 7 and 8.

We Play a Game Public Goods Games Typically, contribution rates: • 40-60% in one-shot games & first round of repeated games • <30% on announced final round • Decrease with group size • Increase with “learning”

Repeated Games Examples of Repeated Prisoner’s Dilemma • Overfishing • Transboundary pollution • Cartel enforcement • Labor union • Public goods The Tragedy of the Global Commons Free-rider Problems

Repeated Games Some Questions: • What happens when a game is repeated? • Can threats and promises about the future influence behavior in the present? • Cheap talk • Finitely repeated games: Backward induction • Indefinitely repeated games: Trigger strategies

The Evolution of Cooperation Under what conditions will cooperation emerge in world of egoists without central authority? Axelrod uses an experimental method – the indefinitely repeated PD tournament – to investigate a series of questions: Can a cooperative strategy gain a foothold in a population of rational egoists? Can it survive better than its uncooperative rivals? Can it resist invasion and eventually dominate the system?

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament Axelrod (1980a,b, Journal of Conflict Resolution). A group of scholars were invited to design strategies to play indefinitely repeated prisoner’s dilemmas in a round robin tournament. Contestants submitted computer programs that select an action, Cooperate or Defect, in each round of the game, and each entry was matched against every other, itself, and a control, RANDOM.

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament Axelrod (1980a,b, Journal of Conflict Resolution). Contestants did not know the length of the games. (The first tournament lasted 200 rounds; the second varied probabilistically with an average of 151.) The first tournament had 14 entrants, including game theorists, mathematicians, psychologists, political scientists, and others. Results were published and new entrants solicited. The second tournament included 62 entrants . . .

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament TIT FOR TAT won both tournaments! TFT cooperates in the first round, and then does whatever the opponent did in the previous round. TFT“was the simplest of all submitted programs and it turned out to be the best!” (31). TFT was submitted by Anatol Rapoport to both tournaments, even after contestants could learn from the results of the first.

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament TIT FOR TAT won both tournaments! In addition, Axelrod provides a “theory of cooperation” based on his analysis of the repeated prisoner’s dilemma game. In particular, if the “shadow of the future” looms large, then players may have an incentive to cooperate. A cooperative strategy such as TFT is “collectively stable.” He also offers an evolutionary argument, i.e., TFT wins in an evolutionary competition in which payoffs play the role of reproductive rates.

The Evolution of Cooperation • The Indefinitely Repeated Prisoner’s Dilemma Tournament • This result has been so influential that “some authors use TIT FOR TAT as though it were a synonym for a self-enforcing, cooperative agreement” (Binmore, 1992, p. 433). And many have taken these results to have shown that TFT is the “best way to play” in IRPD. • While TFT won these, will it win every tournament? • Is showing that TFT is collectively stable equivalent to predicting a winner in the computer tournaments? • Is TFT evolutionarily stable?

The Evolution of Cooperation Class Tournament Imagine a population of strategies matched in pairs to play repeated PD, where outcomes determine the number of offspring each leaves to the next generation. • In each generation, each strategy is matched against every other, itself, and RANDOM. • Between generations, the strategies reproduce, where the chance of successful reproduction (“fitness”) is determined by the payoffs (i.e., payoffs play the role of reproductive rates). Then, strategies that do better than average will grow as a share of the population and those that do worse than average will eventually die-out. . .

Repeated Games Some Questions: • What happens when a game is repeated? • Can threats and promises about the future influence behavior in the present? • Cheap talk • Finitely repeated games: Backward induction • Indefinitely repeated games: Trigger strategies

Repeated Games Can threats and promises about future actions influence behavior in the present? Consider the following game, played 2X: C D C3,3 0,5 See Gibbons: 82-104. D5,0 1,1

Repeated Games Draw the extensive form game: (3,3) (0,5) (5,0) (1,1) (6,6) (3,8) (8,3) (4,4) (3,8)(0,10)(5,5)(1,6)(8,3) (5,5)(10,0) (6,1) (4,4) (1,6) (6,1) (2,2)

Repeated Games Now, consider three repeated game strategies: D (ALWAYS DEFECT): Defect on every move. C (ALWAYS COOPERATE): Cooperate on every move. T (TRIGGER): Cooperate on the first move, then cooperate after the other cooperates. If the other defects, then defect forever.

Repeated Games If the game is played twice, the V(alue) to a player using ALWAYS DEFECT (D) against an opponent using ALWAYS DEFECT(D) is: V (D/D) = 1 + 1 = 2, and so on. . . V (C/C) = 3 + 3 = 6 V (T/T) = 3 + 3 = 6 V (D/C) = 5 + 5 = 10 V (D/T) = 5 + 1 = 6 V (C/D) = 0 + 0 = 0 V (C/T) = 3 + 3 = 6 V (T/D) = 0 + 1 = 1 V (T/C) = 3 + 3 = 6

Repeated Games And 3x: V (D/D) = 1 + 1 + 1 = 3 V (C/C) = 3 + 3 + 3 = 9 V (T/T) = 3 + 3 + 3 = 9 V (D/C) = 5 + 5 + 5 = 15 V (D/T) = 5 + 1 + 1 = 7 V (C/D) = 0 + 0 + 0 = 0 V (C/T) = 3 + 3 + 3 = 9 V (T/D) = 0 + 1 + 1 = 2 V (T/C) = 3 + 3 + 3 = 9

Repeated Games Time average payoffs: n=3 V (D/D) = 1 + 1 + 1 = 3 /3 = 1 V (C/C) = 3 + 3 + 3 = 9 /3 = 3 V (T/T) = 3 + 3 + 3 = 9 /3 = 3 V (D/C) = 5 + 5 + 5 = 15 /3 = 5 V (D/T) = 5 + 1 + 1 = 7 /3 = 7/3 V (C/D) = 0 + 0 + 0 = 0 /3 = 0 V (C/T) = 3 + 3 + 3 = 9 /3 = 3 V (T/D) = 0 + 1 + 1 = 2 /3 = 2/3 V (T/C) = 3 + 3 + 3 = 9 /3 = 3

Repeated Games Time average payoffs: n V (D/D) = 1 + 1 + 1 + ... /n = 1 V (C/C) = 3 + 3 + 3 + ... /n = 3 V (T/T) = 3 + 3 + 3 + ... /n = 3 V (D/C) = 5 + 5 + 5 + ... /n = 5 V (D/T) = 5 + 1 + 1 + ... /n = 1 + e V (C/D) = 0 + 0 + 0 + ... /n = 0 V (C/T) = 3 + 3 + 3 + … /n = 3 V (T/D) = 0 + 1 + 1 + ... /n = 1 - e V (T/C) = 3 + 3 + 3 + ... /n = 3

Repeated Games Now draw the matrix form of this game: 1x C D T C3,3 0,5 3,3 D5,0 1,15,0 T 3,3 0,5 3,3

Repeated Games Time Average Payoffs C D T C3,3 0,53,3 If the game is repeated, ALWAYS DEFECT is no longer dominant. D5,0 1,11+e,1-e T 3,3 1-e,1+e3,3

Repeated Games C D T C3,3 0,53,3 … and TRIGGER achieves “a NE with itself.” D5,0 1,11+e,1-e T 3,3 1-e,1+e3,3

Repeated Games Time Average Payoffs T(emptation) > R(eward) > P(unishment)> S(ucker) C D T CR,R S,TR,R DT,S P,PP+e,P-e T R,R P-e,P+eR,R

Discounting The discount parameter, d, is the weight of the next payoff relative to the current payoff. In a indefinitely repeated game, d can also be interpreted as the likelihood of the game continuing for another round (so that the expected number of moves per game is 1/(1-d)). The V(alue) to someone using ALWAYS DEFECT (D) when playing with someone using TRIGGER (T) is the sum of T for the first move, d P for the second, d2P for the third, and so on (Axelrod: 13-4): V (D/T) = T + dP + d2P + … “The Shadow of the Future”

Discounting Writing this as V (D/T) = T + dP + d2P +..., we have the following: V (D/D) = P + dP + d2P + … = P/(1-d) V (C/C) = R + dR + d2R + … = R/(1-d) V (T/T) = R + dR + d2R + … = R/(1-d) V (D/C) = T + dT + d2T + … = T/(1-d) V (D/T) = T + dP + d2P + … = T+ dP/(1-d) V (C/D) = S + dS + d2S + … = S/(1-d) V (C/T) = R + dR + d2R + … = R/(1- d) V (T/D) = S + dP + d2P + … = S+ dP/(1-d) V (T/C) = R + dR + d2R + … = R/(1- d)

Discounting C D T R/(1-d) S/(1-d) R/(1-d) R/(1-d) T/(1-d) R/(1-d) C Discounted Payoffs T > R > P > S 0 > d > 1 T/(1-d) P/(1-d) T + dP/(1-d) S/(1-d) P/(1-d) S + dP/(1-d) D R/(1-d) S + dP/(1-d) R/(1- d) R/(1-d) T + dP/(1-d) R/(1-d) T

Discounting C D T R/(1-d) S/(1-d) R/(1-d) R/(1-d) T/(1-d) R/(1-d) C Discounted Payoffs T > R > P > S 0 > d > 1 T weakly dominates C T/(1-d) P/(1-d) T + dP/(1-d) S/(1-d)P/(1-d) S + dP/(1-d) D R/(1-d) S + dP/(1-d) R/(1- d) R/(1-d)T + dP/(1-d) R/(1-d) T

Discounting • Now consider what happens to these values as d varies (from 0-1): • V (D/D) = P + dP + d2P + … = P/(1-d) • V (C/C) = R + dR + d2R + … = R/(1-d) • V (T/T) = R + dR + d2R + … = R/(1-d) • V (D/C) = T + dT + d2T + … = T/(1-d) • V (D/T) = T + dP + d2P + … = T+ dP/(1-d) • V (C/D) = S + dS + d2S + … = S/(1-d) • V (C/T) = R + dR + d2R + … = R/(1- d) • V (T/D) = S + dP + d2P + … = S+ dP/(1-d) • V (T/C) = R + dR + d2R + … = R/(1- d)

Discounting • Now consider what happens to these values as d varies (from 0-1): • V (D/D) = P + dP + d2P + … = P+ dP/(1-d) V (C/C) = R + dR + d2R + … = R/(1-d) • V (T/T) = R + dR + d2R + … = R/(1-d) • V (D/C) = T + dT + d2T + … = T/(1-d) • V (D/T) = T + dP + d2P + … = T+ dP/(1-d) • V (C/D) = S + dS + d2S + … = S/(1-d) • V (C/T) =R + dR + d2R + … = R/(1- d) • V (T/D) = S + dP + d2P + … = S+ dP/(1-d) V (T/C) = R + dR + d2R + … = R/(1- d) V(D/D) > V(T/D) D is a best response to D

Discounting • Now consider what happens to these values as d varies (from 0-1): • V (D/D) = P + dP + d2P + … = P+ dP/(1-d) • V (C/C) = R + dR + d2R + … = R/(1-d) • V (T/T) = R + dR + d2R + … = R/(1-d) • V (D/C) = T + dT + d2T + … = T/(1-d) • V (D/T) = T + dP + d2P + … = T+ dP/(1-d) • V (C/D) = S + dS + d2S + … = S/(1-d) • V (C/T) = R + dR + d2R + … = R/(1- d) • V (T/D) = S + dP + d2P + … = S+ dP/(1-d) • V (T/C) = R + dR + d2R + … = R/(1- d) 2 1 3 ?

Discounting Now consider what happens to these values as d varies (from 0-1): For all values of d: V(D/T) > V(D/D) > V(T/D) V(T/T) > V(D/D) > V(T/D) Is there a value of ds.t., V(D/T) = V(T/T)? Call this d*. If d < d*,the following ordering hold: V(D/T) > V(T/T) > V(D/D) > V(T/D) D is dominant: GAME SOLVED ? V(D/T) = V(T/T) T+dP(1-d) = R/(1-d) T-dt+dP = R T-R = d(T-P) d* = (T-R)/(T-P)

Discounting Now consider what happens to these values as d varies (from 0-1): For all values of d: V(D/T) > V(D/D) > V(T/D) V(T/T) > V(D/D) > V(T/D) Is there a value of ds.t., V(D/T) = V(T/T)? Call this d*. d*= (T-R)/(T-P) If d > d*,the following ordering hold: V(T/T) > V(D/T) > V(D/D) > V(T/D) D is a best response to D;T is a best response to T; multiple NE.

Discounting Graphically: The V(alue) to a player using ALWAYS DEFECT (D) against TRIGGER (T), and the V(T/T) as a function of the discount parameter (d) V T R V(D/T) = T + dP/(1-d) V(T/T) = R/(1-d) d* 1

Unit III: The Evolution of Cooperation