Unit III: The Evolution of Cooperation

Unit III: The Evolution of Cooperation • Can Selfishness Save the Environment? • Repeated Games: the Folk Theorem • Evolutionary Games • A Tournament • How to Promote Cooperation/Unit Review 4/16

Today’s Agenda • Bounded Rationality • Designing Repeated Game Strategies • Finite Automata • Tournament Assignment • How to Promote Cooperation • The Fisherman Problem (Again)

Bounded Rationality In the Repeated Prisoner’s Dilemma, it has been suggested that “uncooperative behavior is the result of ‘unbounded rationality’, i.e., the assumed availability of unlimited reasoning and computational resources to the players” (Papadimitrou, 1992: 122). If players are boundedly rational, on the other hand, the cooperative outcome may emerge as the result of a “muddling” process. They reason inductively and adapt (imitate or learn) locally superior strategies. Thus, not only is bounded rationality a more “realistic” approach, it may also solve some deep analytical problems, e.g., resolution of finite horizon paradoxes.

Designing Repeated Game Strategies Imagine a very simple decision making machine playing a repeated game. The machine has very little information at the start of the game: no knowledge of the payoffs or “priors” over the opponent’s behavior. It merely makes a choice, receives a payoff, then adapts its behavior, and so on. The machine, though very simple, is able to implement a strategy against any possible opponent, i.e., it “knows what to do” in any possible situation of the game.

Designing Repeated Game Strategies A repeated game strategy is a map from a history to an action.A history is all the actions in the game thus far …. …T-3T-2T-1To C C C C D C C C C C D D C D History at time To ?

Designing Repeated Game Strategies A repeated game strategy is a map from a history to an action.A history is all the actions in the game thus far, subject to the constraint of a finite memory: …T-3T-2T-1To C C C C D C C C C C D D C C History of memory-4 ?

Designing Repeated Game Strategies TIT FOR TAT is a remarkably simple repeated game strategy. It merely requires recall of what happened in the last round (memory-1). …T-3T-2T-1To C C C C D D C C C C D D C D History of memory-1 ?

Finite Automata A FINITE AUTOMATON (FA) is a mathematical representation of a simple decision-making process. FA are completely described by: • A finite set of internal states • An initial state • An output function • A transition function The output function determines an action, C or D, in each state. The transition function determines how the FA changes states in response to the inputs it receives (e.g., actions of other FA). Rubinstein, “Finite Automata Play the Repeated PD” JET, 1986)

Finite Automata FA will implement a strategy against any possible opponent, i.e., they “know what to do” in any possible situation of the game. FA meet in 2-player repeated games and make a move in each round (either C or D). Depending upon the outcome of that round, they “decide” what to play on the next round, and so on. FA are very simple, have no knowledge of the payoffs or priors over the opponent’s behavior, and no deductive ability. They simply read and react to what happens. Nonetheless, they are capable of a crude form of “learning” — they receive payoffs that reinforce certain behaviors and “punish” others.

Finite Automata C D D C D C “TIT FOR TAT”

Finite Automata C C D D D C C D C “TIT FOR TWO TATS”

Finite Automata C,D C,D C C D Some examples: D D C D C D D START C “ALWAYS DEFECT” “TIT FOR TAT” “GRIM (TRIGGER)” C C C D C C D C D C C D D D D “PAVLOV” “M5”

Calculating Automata Payoffs Time-average payoffs can be calculated because any pair of FA will achieve cycles, since each FA takes as input only the actions in the previous period (i.e., it is “Markovian”). For example, consider the following pair of FA: D C C C D C C C C C D D D D D “PAVLOV” “M5”

Calculating Automata Payoffs PAVLOV: C M5: D D C C C D C C C C C D D D D D “PAVLOV” “M5”

Calculating Automata Payoffs Payoff 0 5 1 0 5 1 0 5 AVG=2 PAVLOV C D D C D D C D M5 D C D D C D D C Payoff 5 0 1 5 0 1 5 AVG=2 cycle cycle cycle D C C C D C C C C C D D D D D “PAVLOV” “M5”

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament Axelrod (1980a,b, Journal of Conflict Resolution). A group of scholars were invited to design strategies to play indefinitely repeated prisoner’s dilemmas in a round robin tournament. Contestants submitted computer programs that select an action, Cooperate or Defect, in each round of the game, and each entry was matched against every other, itself, and a control, RANDOM. . .

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament Axelrod (1980a,b, Journal of Conflict Resolution). Contestants did not know the length of the games. (The first tournament lasted 200 rounds; the second varied probabilistically with an average of 151.) The first tournament had 14 entrants, including game theorists, mathematicians, psychologists, political scientists, and others. Results were published and new entrants solicited. The second tournament included 62 entrants . . .

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament TIT FOR TAT won both tournaments! TFT cooperates in the first round, and then does whatever the opponent did in the previous round. TFT “was the simplest of all submitted programs and it turned out to be the best!” (31). TFT was submitted by Anatol Rapoport to both tournaments, even after contestants could learn from the results of the first.

The Evolution of Cooperation The Indefinitely Repeated Prisoner’s Dilemma Tournament This result has been so influential that “some authors use TIT FOR TAT as though it were a synonym for a self-enforcing, cooperative agreement” (Binmore, 1992, p. 433). And many have taken these results to have shown that TFT is the “best way to play” in IRPD. • While TFT won these, will it win every tournament? • Is showing that TFT is collectively stable equivalent to predicting a winner in the computer tournaments? • Is TFT evolutionarily stable?

The Trouble with TIT FOR TAT TIT FOR TAT is susceptible to 2 types of perturbations: Mutations: random Cs can invade TFT (TFT is not ESS), which in turn allows exploiters to gain a foothold. Noise: a “mistake” between a pair of TFTs induces CD, DC cycles (“mirroring” or “echo” effect). TIT FOR TAT never beats its opponent; it wins because it elicits reciprocal cooperation. It never exploits “naively” nice strategies. (See Poundstone: 242-248; Casti 76-84.)

The Evolution of Cooperation Class Tournament Imagine a population of strategies matched in pairs to play repeated PD, where outcomes determine the number of offspring each leaves to the next generation. • In each generation, each strategy is matched against every other, itself, and RANDOM. • Between generations, the strategies reproduce, where the chance of successful reproduction (“fitness”) is determined by the payoffs (i.e., payoffs play the role of reproductive rates). Then, strategies that do better than average will grow as a share of the population and those that do worse than average will eventually die-out. . .

Tournament Assignment Design a strategy to play an Evolutionary Prisoner’s Dilemma Tournament. Entries will meet in a round robin tournament, with 1% noise (i.e., for each intended choice there is a 1% chance that the opposite choice will be implemented). Games will last at least 1000 repetitions (each generation), and after each generation, population shares will be adjusted according to the replicator dynamic, so that strategies that do better than average will grow as a share of the population whereas others will be driven to extinction. The winner or winners will be those strategies that survive after at least 10,000 generations.

Tournament Assignment To design your strategy, access the programs through your fas Unix account. The Finite Automaton Creation Tool (fa) will prompt you to create a finite automata to implement your strategy. Select the number of internal states, designate the initial state, define output and transition functions, which together determine how an automaton “behaves.” The program also allows you to specify probabilistic output and transition functions. Simple probabilistic strategies such as GENEROUS TIT FOR TAT have been shown to perform particularly well in noisy environments, because they avoid costly sequences of alternating defections that undermine sustained cooperation.

Tournament Assignment C,D C C C D Some examples: D D C D C .9D D START C D ALWAYS DEFECT TIT FOR TAT GENEROUS PAVLOV A number of test runs will be held and results will be distributed to the class. You can revise your strategy as often as you like before the final submission date. You can also create your own tournament environment and test various designs before submitting. Entries must be submitted by 5pm, Friday, May 9.

Computer Instructions Creating your automaton To create a finite automaton (fa) you need to run the fa creation program. Log into your unix account and at the % prompt, type: ~neugebor/simulation/fa From there, simply follow the instructions provided. Use your user name as the name for the fa. If anything goes wrong, simply press “ctrl-c” and start over.

Computer Instructions Creating your automaton The program prompts the user to: • specify the number of states in the automaton, with an upper limit of 50. For each state, the program asks: • “choose an action (cooperate or defect);” and • “in response to cooperate (defect), transition to what state?” Finally, the program asks: • specify the initial state. The program also allows the user to specify probabilistic outputs and transitions.

Computer Instructions Submitting your automaton After creating the fa, submit it by typing: cp username.fa ~neugebor/ece08 chmod 744 ~neugebor/ece08/username.fa where username is your user name. You may resubmit as often as you like before the submission deadline.

Computer Instructions Testing your automaton You may wish to test your fa before submitting it. You can do this by running sample tournaments with different fa’s you’ve created. To run the tournament program, you must copy it into your own account. You can do this by typing: mkdir simulation cp ~neugebor/simulation/* simulation To change into the directory with the tournament program type: cd simulation Then, to run the tournament type: ./tournament NOTE: To run the tournament, you must be logged on to an iceserver.

Computer Instructions Testing your automaton Follow the instructions provided. Note that running a tournament with many fa’s can be computationally intensive and may take a long time to complete. Use your favorite text editor to view the results of the tournament (“less” is an easy option if you are unfamiliar with unix -- type “less textfilename” to open a text file). To create extra automaton to test in your tournament type: ./fa Name each fa whatever you want by entering the any name you wish to use instead of your user name. Initially six different kinds of fa’s are in the directory: D, C, TFT, GRIM, PAVLOV, AND RANDOM. Experiment with these and others as you like.

How to Promote Cooperation • Advice to Participants • Advice to Reformers • The Role of Institutions • Learning to Cooperate • Unit Review

How to Promote Cooperation Axelrod offers two types of advice on how to promote cooperation (1984, pp.199-244): • Advice to Participants - the players in the game • Advice to Reformers - the rules makers

Advice to Participants How to Choose Effectively (Axelrod, 1984: 109-123.) • Don’t be envious • Don’t be the first to defect • Reciprocate both cooperation and defection • Don’t be too clever These are intended as the ingredients of a strategy that will, in the long range and against a wide range of opponents, advance the player’s interests.

Advice to Participants • Nice: Never be the first to defect. A nice strategy signals a willingness to cooperate and may induce reciprocal cooperation. Nice strategies did best in Axelrod’s tournaments. • Provocable: Punish defection. Don’t get fleeced by an exploiter. • Forgiving: Reciprocate cooperation. Triggers may be susceptible to misunderstandings, mistakes, etc, that can lead otherwise cooperative players into spirals of alternating or mutual defection. • Clear: Be easily understood and predictable. Don’t be too clever. A simple rule works best.

Advice to Participants Sucker the Simple? Recall that while TIT FOR TAT never beats is opponent, PAVLOV always defects against a naïve cooperator. Hence, the success of PAVLOV in newer tournaments may suggest it is wise to exploit the weak, both (i) for “egoistic” benefit; and (ii) to increase the overall fitness of the population. Either the simple will learn (not to let themselves be exploited), or they will be winnowed.

Advice to Reformers Axelrod offers five concrete suggestions on how “the strategic setting itself can be transformed in order to promote cooperation among the players” (124-141): • Enlarge the “shadow of the future” • Change the payoffs • Teach people to care about each other • Teach reciprocity • Improve recognition abilities

Advice to Reformers Repeated interactions provide the conditions necessary for cooperation by transforming the nature of the interaction in two ways: • “Enlarge the shadow of the future” • Increase the amount of information in the system. This may reduces strategic uncertainty (e) and allow players to coordinate their expectations and behavior on mutually beneficial outcomes. d 1 d* = T-R T-P e0

Changing the Rules of the Game Axelrod and Keohane (1986) apply the lessons from The Evolution of Cooperation to international relations, arguing that “not only can actors in world politics pursue different strategies within an established context of interaction, they may also seek to alter the context through building institutions embodying particular principles, norms, rules, or procedures for the conduct of international relations” (p. 228). Building an institution implies changing the context within which states make their decisions, and this may make it possible to achieve cooperation where it had been inaccessible. Hence, institutions “contribute to cooperation (...) by changing the context within which states make decisions based on self-interests” (Keohane, 1984, p. 13).

Learning to Cooperate We have seen that whereas cooperation is irrational in a one-shot Prisoner’s Dilemma, it may be rational (i.e., achieved in a SPNE), if the game is repeated and “the shadow of the future” is sufficiently large: d > (T-R)/(T-P) (i) Repeated interaction is a necessary but not a sufficient condition for cooperation. In addition, players must have reason to believe the other will reciprocate. This involves judging intentions, considerations of fairness, (mis)communication, trust, deception, etc.

Learning to Cooperate The Folk Theorem The shaded area is the set of SPNE. The segment PP,RR is the set of “collectively stable” strategies, for (d > d*). (S,T) (R,R) (P,P) (T,S)

Learning to Cooperate We have seen that whereas cooperation is irrational in a one-shot Prisoner’s Dilemma, it may be rational (i.e., achieved in a SPNE), if the game is repeated and “the shadow of the future” is sufficiently large: d > (T-R)/(T-P) (i) Repeated interaction is a necessary but not a sufficient condition for cooperation. In addition, players must have reason to believe the other will reciprocate. This involves judging intentions, considerations of fairness, (mis)communication, trust, deception, etc.

Learning to Cooperate Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond. • Period 1 each fishery chooses to consume (c1, c2). • Period 2 remaining fish are equally divided (Y – (c1+c2))/2). c1 = (Y – c2)/2 NE: c1 = c2 = Y/3 Social Optimum: c1 = c2 = Y/4 c2 Y/3 Y/4 c2 = (Y – c1)/2 Y/4Y/3 c1

Learning to Cooperate Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond. • Period 1 each fishery chooses to consume (c1, c2). • Period 2 remaining fish are equally divided (Y – (c1+c2))/2). c1 = (Y – c2)/2 If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring. c2 Y/3 Y/4 c2 = (Y – c1)/2 Y/4Y/3 c1

Learning to Cooperate If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring. C D C = 3 in the spring D = 4 “ “ C9, 9 7.5,10 A Prisoner’s Dilemma What would happen if the game were repeated? D 10,7.5 8, 8

Learning to Cooperate Imagine the fisherman make the following deal: Each will Cooperate (consume only 3) in the spring as long as the other does likewise; as soon as one Defects, the other will Defect for ever, i.e., they adopt trigger strategies. This deal will be stable if the threat of future punishment makes both unwilling to Defect, i.e., if the one period gain from Defect is not greater than the discounted future loss due to the punishment: (T – R) < (dR/(1-d) – dP/(1-d)) (ii)

Learning to Cooperate Imagine there are many fishermen, each of whom can adopt either D(efect), C(ooperate), or T(rigger). In every generation, each fisherman plays against every other. After each generation, those that did poorly can switch to imitate those that did better. Eventually, C will die out, and the population will be dominated by either D or T, depending on the discount parameter. Noise (miscommunication) can also affect the outcome.

Unit Review Can Selfishness Save the Environment? Common property resources (e.g., clean air, clean water), will often be overconsumed and public goods (e.g., legal system, public radio) undersupplied because of the incentive to defect, or free-ride. These are examples of n-person Prisoner’s Dilemmas. Viewed as a one-shot interaction, the Prisoner’s Dilemma has pessimistic implications for rational behavior. Yet examples from biology and elsewhere suggest that strictly selfish behavior may take on a socially cooperative form in the long run e.g., the “selfish gene” gives rise to kinship relations and altruism.

Unit Review Repeated Games: The Folk Theorem Analysis of repeated games suggest it may be possible for cooperation to emerge over the course of a long-term interaction. Theorem: Any payoff that pareto-dominates the one-shot NE can be supported in a SPNE of the repeated game, if the discount parameter is sufficiently high. Axelrod (1984) argued that when the PD is repeated and the “shadow of the future” is large (d > d*), players will have an incentive to cooperate.

Unit Review The Indefinitely Repeated Prisoner’s Dilemma Tournament The success of TIT FOR TAT in tournaments of the repeated prisoner’s dilemma has led many to the optimistic conclusion that whereas rational players will choose to defect in a one-shot prisoner’s dilemma, repeated interaction may allow the evolution of cooperation over time. This is because TIT FOR TAT does well against other cooperative strategies and hence can grow as a proportion of the population over repeated plays of the game. Yet in evolutionary games, there is reason to believe that TIT FOR TAT will perform less well in noisy environments and over substantially longer (and more complex) time horizons.

Simulating Evolution Pop. Share 0.140 0.100 0.060 0.020 0 200 400 600 800 Generations 1(TFT) 3 2 6 7,9 10 4 11 5 8 No. = Position after 1st Generation 18 14,12,15 13 Source: Axelrod 1984, p. 51.

Unit Review The Indefinitely Repeated Prisoner’s Dilemma Tournament The success of TIT FOR TAT in tournaments of the repeated prisoner’s dilemma has led many to the optimistic conclusion that whereas rational players will choose to defect in a one-shot prisoner’s dilemma, repeated interaction may allow the evolution of cooperation over time. This is because TIT FOR TAT does well against other cooperative strategies and hence can grow as a proportion of the population over repeated plays of the game. Yet in evolutionary games, there is reason to believe that TIT FOR TAT will perform less well in noisy environments and over substantially longer (and more complex) time horizons.

Unit III: The Evolution of Cooperation