1 / 14

Overall Desiderata for Sigma (

Modeling Two-Player Games in the Sigma Graphical Cognitive Architecture David V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li 8.1.2013. Σ. Overall Desiderata for Sigma (

duman
Télécharger la présentation

Overall Desiderata for Sigma (

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Two-Player Games in the Sigma Graphical Cognitive ArchitectureDavid V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li8.1.2013 Σ

  2. Overall Desiderata for Sigma (𝚺) • A new breed of cognitive architecture that is • Grand unified • Cognitive + key non-cognitive (perception, motor, affective, …) • Functionally elegant • Broadly capable yet simple and theoretically elegant • “cognitive Newton’s laws” • Sufficiently efficient • Fast enough for anticipated applications • For virtual humans & intelligent agents/robots that are • Broadly, deeply and robustly cognitive • Interactive withtheir physical and social worlds • Adaptivegiven their interactions and experience Hybrid: Discrete + Continuous Mixed: Symbolic + Probabilistic

  3. Sample ICT Virtual Humans Gunslinger Ada & Grace For education, training, interfaces, health, entertainment, … INOTS SASO

  4. Theory of Mind (ToM) in Sigma • ToM models the minds of others, to enable for example: • Understanding multiagent situations • Participating in social interactions • ToM approach based on PsychSim (Marsella & Pynadath) • Decision theoretic problem solving based on POMDPs • Recursive agent modeling • Questions to be answered • Can Sigma elegantly extend to comparable ToM? • What are the benefits for ToM? • What new phenomena emerge from this combination? • Results reported here concern: • Multiagent Sigma • Implementation of single shot, two player games • Both simultaneous and sequential moves

  5. The Structure of Sigma 𝚺 Cognitive System Computer System • Constructed in layers • In analogy to computer systems Programs & Services Knowledge & Skills Computer Architecture Cognitive Architecture Microcode Architecture Graphical Architecture Hardware Lisp Cognitive Arch: Predicates (WM) Conditionals (LTM) Perception Memory Access Decision Learning Action Graphical Architecture: Graphical models Piecewise linear functions Graph Solution Graph Modification Conditionals: Deep blending of rules and probabilistic networks Graphical models: Factor graphs + summary product algorithm

  6. Control Structure: Soar-like Nesting of ThreeLayers • A reactive layer • One (internally parallel) graph/cognitive cycle Which acts as the inner loop for • A deliberative layer • Serial selection and application of operators Which acts as the inner loop for • A reflective layer • Recursive, impasse-driven, meta-level generation • The layers differ in • Time scales • Serial versus parallel • Controlled versus uncontrolled Tie No-Change

  7. Single-Shot, Simultaneous-Move, Two-Player Games B • Two players move simultaneously • Played only once (not repeated) • So no need to look beyond current decision • Symmetric and asymmetric games • Socially preferred outcome: optimum in some sense • Nash equilibrium: Neither player can unilaterally increase their payoff by altering their own choice • Key result:Sigma found the best Nash equilibrium in one memory access (i.e., graph solution) • Although linear combination in article can’t always guarantee it A 602 Messages 962 Messages

  8. Sequential Games • Players (A, B) alternate moves • E.g., Ultimatum, centipede and negotiation • Decision-theoretic approach with softmaxcombination • Use expected value at each level of search • Action Ps assumed exponential in their utilities (à la Boltzmann) • There may be many Nash equilibria • Instead seek stricter concept of subgame perfection • Overall strategy is an equilibrium strategy over any subgame • Key result:Games solvable in two modes: • Automatic/reactive/system-1 • Controlled/deliberate/system-2 Both modes well documented in humans for general processing Combination not found previously in ToM models

  9. The Ultimatum Game • A starts with a fixed amount of money (3) • A decides how much (in 0-3) to offer B • B decides whether or not to accept the offer • If B accepts, each gets the resulting amount • If B rejects, both get 0 • Each has a utility function over money • E.g., <.1, .4, .7, 1>

  10. Automatic/Reactive Approach • A trellis (factor) graph in LTM with one stage per move • Focus on backwards messages from reward(s) CONDITIONAL Transition-B Conditions: Money(agent:Bquantity:moneyb) Condacts: Accept(offer:offeracceptance:choice) Function(choice,offer,moneyb): 1<T,0,0>, 1<T,1,1>, 1<T,2,2>, 1<T,3,3>, 1<F,*,0> CONDITIONAL Reward Condacts: Money(agent:agentquantity:money) Function(agent,money): .1<*,0>, .4<*,1>, .7<*,2>, 1<*,3> reward offer TA accept TB money exp CONDITIONAL Transition-A Conditions: Money(agent:Aquantity:moneya) Accept-E(offer:offeracceptance:choice) Condacts: Offer(agent:Aquantity:offer) Function(choice,offer,moneya): 1<T,0,3>, 1<T,1,2>, 1<T,2,1>, 1<T,3,0>, 1<F,*,0>

  11. Controlled/Deliberate(Reflective) Approach 0 0 0 accept accept 1 1 1 2 reject 2 2 reject 3 3 3 • Decision-theoretic problem-space search across metalevels • Very Soar-like, but with softmax combination • Depends on summary product and Sigma’s mixed aspect • Corresponds to PsychSim’s online reasoning none none tie tie tie tie no-change tie no-change no-change E(2) 2 2 accept E(accept) E(2) A A A A 1 B B

  12. Comments on the Ultimatum Game • Automatic version (5 conditionals) • A’s normalized distribution over offers: <.315, .399, .229, .057> • 1 decision (94 messages) and .02 s (on a MacBook Air) • Controlled version (19 conditionals) • A’s normalized distribution over offers: <.314, .400, .229, .057> • 72 decisions (868 messages/decision) and 126.69 s • Same result, with distinct computational properties • Automatic is fast and occurs in parallel with other memory processing, but is not (easily) penetrable by new bits of other knowledge • Controlled is slow, sequential, but can (easily) integrate new knowledge • Distinction also maps onto expert versus novice behavior in general Raises possibility of a generalization of Soar’s chunking mechanism • Compile/learn automatic trellises from controlled problem solving • Finer grained, mixed(/hybrid) learning mechanism Distributions Comparable Speed Ratio >6000

  13. Conclusion • Simultaneous games are solvable within a single decision • Yield Nash equilibria (although linear combination doesn’t guarantee) • Sequential games are solvable in either an automatic or a controlled manner • Raises possibility of a mixed variant of chunking that automatically learns probabilistic trellises (HMMs, DBNs, …) from problem solving • May yield a novel form of general structure learning for graphical models • Two architectural modifications to Sigma were required • Multiagent decision making (and reflection) • Optional exponentiation of outgoing WM messages (for softmax) • Future work includes • More complex games • Belief updating (learning models of others)

  14. Mental imagery[BICA 11a; AGI 12a] 1-3D continuous imagery buffer Object transformation Feature& relationship detection Perception [BICA 11b] Object recognition (CRFs) Localization Natural language Question answering (selection) Word sense disambiguation [ICCM 13] Part of speech tagging [ICCM 13] Isolated word speech recognition Graph integration[BICA 11b] CRF + Localization + POMDP Overall Progress in Sigma • Memory [ICCM 10] • Procedural (rule) • Declarative (semantic/episodic) • Constraint • Problem solving • Preference based decisions[AGI 11] • Impasse-driven reflection[AGI 13] • Decision-theoretic (POMDP)[BICA 11b] • Theory of Mind[AGI 13] • Learning[ICCM 13] • Episodic • Concept (supervised/unsupervised) • Reinforcement[AGI 12b] • Action modeling[AGI 12b] • Map (as part of SLAM) Some of these are still just beginnings

More Related