450 likes | 562 Vues
This presentation by Ion Juvina and Christian Lebiere from Carnegie Mellon University discusses the concept of transfer of learning within the context of strategic interaction games, specifically the Prisoner's Dilemma and the Chicken game. The authors outline foundational theories of transfer, the experimental design involving 480 participants, and the results indicating that mutual cooperation increases with awareness of interdependence. The study highlights both surface and deep similarities between games and examines the mechanisms underlying transfer in strategic scenarios.
E N D
Modeling transfer of learning in games of strategic interaction Ion Juvina & Christian Lebiere Department of Psychology Carnegie Mellon University ACT-R Workshop July 2012
Background | Experiment | Model | In progress | Discussion Outline • Background • Experiment • Cognitive model • Work in progress • Discussion
Background| Experiment | Model | In progress | Discussion Transfer of learning • Alfred Binet (1899): • Formal discipline: • Exercise of mental faculties -> generalization • Thorndike (1903): • Identical element theory: • transfer of learning occurs only when identical elements of behavior are carried over from one task to another • Singley & Anderson (1989): • Surface vs. deep similarities • Common “cognitive units”
Background| Experiment | Model | In progress | Discussion Transfer in strategic interaction • Bipartisan cooperation in Congress • Golf -> bipartisanship? • Similarity? What is transferred?
Background| Experiment | Model | In progress | Discussion Prisoner’s Dilemma (PD)
Background| Experiment | Model | In progress | Discussion Chicken game (CG)
Background| Experiment | Model | In progress | Discussion PD & CG payoff matrices
Background| Experiment | Model | In progress | Discussion Similarities between PD & CG • Surface (near transfer) • 2X2 games • 2 symmetric and 2 asymmetric outcomes • [1,1] outcome is identical • Deep (far transfer) • Mixed motive • Non-zero sum • Mutual cooperation is superior to competition in long term • Though unstable (risky)
Background| Experiment | Model | In progress | Discussion Differences between PD & CG • Different equilibria: • Symmetric in PD: [-1,-1] • Asymmetric in CG: [-1, 10] and [10,-1] • Different strategies to maximize joint payoff (Pareto-efficient outcome): • [1,1] in PD • Alternation of [-1,10] and [10,-1] in CG
Background | Experiment | Model | In progress | Discussion Questions / hypotheses • Similarities • Identical element? Common cognitive units? • Transfer of learning • Is there any transfer? • Only in one direction? • Low – high entropy? (Bednar, Chen, Xiao Liu, & Page, in press) • Identical element -> both ways • Mechanism of transfer • Reciprocal trust mitigates the risk associated with the long term solution (Hardin, 2002)
Background | Experiment | Model | In progress | Discussion Participants and design • 480 participants (CMU students) • 240 pairs • 2 within-subjects games: PD & CG • 4 between-subjects information conditions • No-info: 60 pairs • Min-info: 60 pairs • Mid-info: 60 pairs • Max-info: 60 pairs • 2 between-subjects order conditions in each information condition • PD-CG: 30 pairs • CG-PD: 30 pairs • 200 unnumbered rounds for each game
Background | Experiment | Model | In progress | Discussion Typical outcomes
Background | Experiment | Model | In progress | Discussion Pareto-optimal equilibria
Background | Experiment | Model | In progress | Discussion [1,1] increases with info
Background | Experiment | Model | In progress | Discussion Alternation increases with info
Background | Experiment | Model | In progress | Discussion PD – CG sequence
Background | Experiment | Model | In progress | Discussion CG – PD sequence
Background | Experiment | Model | In progress | Discussion PD before and after
Background | Experiment | Model | In progress | Discussion CG before and after
Background | Experiment | Model | In progress | Discussion Transfer from PD to CG • Increased [1,1] (surface transfer) • Increased alternation (deep transfer)
Background | Experiment | Model | In progress | Discussion Transfer from CG to PD • Increased [1,1] (surface + deep transf.)
Background | Experiment | Model | In progress | Discussion PD CG [1,1] Surface [1,1] Deep [10,-1] / [-1,10] Divergent effects
Background | Experiment | Model | In progress | Discussion CG PD [1,1] Surface [1,1] Deep [10,-1] / [-1,10] Convergent effects
Background | Experiment | Model | In progress | Discussion Reciprocation by info
Background | Experiment | Model | In progress | Discussion Payoff by info in PD and CG
Background | Experiment | Model | In progress | Discussion Summary results • Mutual cooperation increases with awareness of interdependence (info) • Transfer of learning • Better performance “after” than “before” • Combined effects of surface and deep similarities • CG -> PD surface similarity facilitates transfer • PD -> CG surface similarity interferes with transfer • Transfer occurs in both directions • Mechanism of generalization • Reciprocal trust?
Background | Experiment | Model | In progress | Discussion Cognitive model • Awareness of interdependence • Opponent modeling • Generality • Utility learning (reinforcement learning) • Transfer of learning • Surface transfer • Deep transfer
Background | Experiment | Model | In progress | Discussion Opponent modeling • Instance-based learning • Dynamic representation of the opponent • Sequence learning • Prediction of opponent’s next move • Instance (snapshot of the current situation) • Previous moves and opponent’s current move • Contextualized expectations
Background | Experiment | Model | In progress | Discussion Utility learning • Reinforcement learning • Strategy: what move to make given • Expected move of opponent • Context (previous moves) • Reward functions • Own payoff – Opponent’s payoff • Opponent’s payoff • Joint payoff – Opponent’s previous payoff
Background | Experiment | Model | In progress | Discussion Surface transfer • Declarative sub-symbolic learning • Retrieval of instances guided by recency and frequency • Strategy learning • A learned strategy continues to be used for a while until it is unlearned
Background | Experiment | Model | In progress | Discussion Deep transfer • Trust learning / Trust dynamics • Trust accumulator • Increases when opponent makes cooperative (risky) moves • Decreases when opponent makes competitive moves • Trust invest accumulator • Increases with mutually destructive outcome • Decreases with unreciprocated cooperation (risk taking)
Background | Experiment | Model | In progress | Discussion Meta strategy • Determines which reward function to use • Trust accumulator <= 0 • Reward = own payoff – opponent’s payoff • Trust invest accumulator > 0 • Reward = opponent’s payoff • Trust accumulator > 0 • Reward = joint payoff – opp’s previous payoff
Background | Experiment | Model | In progress | Discussion Instance Current moves: A B Previous moves: A A Move Best response: A Predicted move: A Opponent Move A Declarative Memory Procedural Memory Trust Trust accumulator Trust invest Inst3 Rule1 Rule3 Inst2 Inst4 Rule2 Reward Inst1 Prediction Previous moves: A B Opponent move: A Model diagram Environment ACT-R ACT-R extension HSCB 2011
Background | Experiment | Model | In progress | Discussion PD-CG
Background | Experiment | Model | In progress | Discussion CG-PD
Background | Experiment | Model | In progress | Discussion PD-CG surface transfer
Background | Experiment | Model | In progress | Discussion PD-CG deep transfer
Background | Experiment | Model | In progress | Discussion CG – PD surf+deep transfer
Background | Experiment | Model | In progress | Discussion Trust simulation
Background | Experiment | Model | In progress | Discussion Summary model results • Awareness of interdependence • Opponent modeling • Generality • Utility learning • Transfer of learning • Surface level transfer: cognitive units • Deep level transfer: Trust
Background | Experiment | Model | In progress | Discussion In progress • Expand model to account for all information conditions • Develop more ecologically valid paradigm (IPD^3) • Model “affective” processes in ACT-R
Background | Experiment | Model | In progress | Discussion IPD^3
Background | Experiment | Model | In progress | Discussion General discussion • Transfer of learning is possible • Deep similarities: interpersonal level • IPD^3 • To be used in behavioral experiments • Tool for learning strategic interaction skills
Acknowledgments • Coty Gonzalez • Jolie Martin • Hau-Yu Wong • Muniba Saleem • This research is supported by the Defense Threat Reduction Agency (DTRA) grant number: HDTRA1-09-1-0053 to Cleotilde Gonzalez and Christian Lebiere
• Thank you for your attention! • Questions?