360 likes | 713 Vues
Abstracting and Composing High-Fidelity Cognitive Models of Multi-Agent Interaction MURI Kick-Off Meeting August 2008. Christian Lebiere David Reitter Psychology Department Carnegie Mellon University. Main Issues. Understand scaling properties of cognitive performance
E N D
Abstracting and Composing High-Fidelity Cognitive Models of Multi-Agent Interaction MURI Kick-Off Meeting August 2008 Christian Lebiere David Reitter Psychology Department Carnegie Mellon University
Main Issues • Understand scaling properties of cognitive performance • Most experiments look at a single performance point rather than as a function of problem complexity, time pressure, etc • Key component in abstracting performance at higher levels • Understand interaction between humans and machines • Most experiments study and model human performance under a fixed scenario that misses key dynamics of interaction • Key aspect of both system robustness and vulnerabilities • Understand generality and composability of behavior • Almost all models are developed for specific tasks rather than assembling larger pieces of functionality from basic pieces • Key enabler of scaling models and abstracting their properties
Cognitive Architectures • What is a cognitive architecture? • Invariant mechanisms to capture generality of cognition (Newell) • Aims for both breadth (Newell Test) and depth (quantitative data) • How are they used? • Develop model of a task (declarative knowledge, procedural strategies, architectural parameters) • Limits of model fitting (learning mechanisms, architectural constraints, reuse of model and parameters) • ACT-R • Modular organization, communication bottlenecks, mapping to brain regions • Mix of symbolic production system and subsymbolic statistical mechanisms
ACT-R Cognitive Architecture Activation Learning Latency Utility Learning Productions Retrieval Goal Manual Visual Motor Vision Intentions Memory World 3 L 1 20 S Y 20 Size Fuel Turb Dec IF the goal is to categorize new stimulus and visual holds stimulus info S, F, T THEN start retrieval of chunk S, F, T and start manual mouse movement Stimulus SSL S13 Bi Chunk
Model - Methodology • Model designed to solve task simply and effectively • Not engineered to reproduce any specific effects • Reuse of common design patterns • Makes modeling easier and faster • Reduces degrees of freedom • No fine-tuning of parameters • Left at default values or roughly estimated from data (2) • Architecture provides automatic learning of situation • Position & status of AC naturally learned from interaction
Model - Methodology II • As many model runs as subject runs • Performance variability is an essential part of the task! • Model speed is essential (5 times real-time in this case) • Stochasticity is a fundamental feature of the architecture • Production selection • Declarative retrieval • Perception and actions • Stochasticity amplified by interaction with environment • Model captures most of variance of human performance • No individual variations factored in the model (W, efforts)
Model - Overview • 5 (simple) declarative chunks encoding instructions • Associate color to action and penalty • 36 (simple) productions organized in 5 unit tasks • Color-Goal(5): top-level goal to pick next color target • Text-Goal(4): top-level goal to pick next area to scan • Scan-Text(7): goal to scan text window for new messages • Scan-Screen(8): goal to scan screen area for exiting AC • Process(12): processes a target with 3 or 4 mouse clicks • Unit tasks map naturally to ACT-R goal type and production-matching - a natural design pattern
Flyoff - Performance • Performance is much better in the color than text condition • Performance degrades sharply with time pressure for text • Good fit except for text-high: huge variation with tuneup too
Flyoff - Distribution • The model can yield a wide range of performances through retrieval and effort stochasticity and dynamic interaction • Model variability always tends to be lower than the subjects
Flyoff - Penalty Profile • Errors: no speed change error or click error but incorrect and duplicated messages occurring during the handling of holds • Delays: more holds for high but fewer welcome and speed
Flyoff - Latency • Response times increase exponentially with number of intervening events and faster for text than color condition • Model is slightly faster in color but slower in text condition
Flyoff - Selection • The number of selections decreases roughly exponentially, with text starting lower but trailing off longer with final spike • Ceiling effect in color condition (mid & high): see workload
Flyoff - Workload • Workload is higher in text condition and increases faster • Model reproduces both effects but misses ceiling effect in color condition even though it gets it for selection measure!
Learning Categories • Model learns responses through instance-based categorization • Learning curve and level of performance reflects degree of complexity of function mapping aircraft characteristics to response
Transfer Errors • Transfer performance is defined by (linear) similarities between stimuli values along each dimension (size, fuel, turb.) • Excellent match to trained instances (better than trial 8!). • Extrapolated: syntactic priming or non-linear similarities?
Individual Stimuli Predictions • Good match to probability of accepting individual stimuli for each category. • RMSE: • Cat. 1 = 14.1% • Cat. 3 = 13.4% • Cat. 6 = 12.5%
Task Approach • Use similar task to AMBR - AMBR variant, Team Argus, CMU-ASP (Aegis) - for exploration • Introduce team aspect that is implicit in task by interchangeably replacing controllers by humans, models or agents • Right properties, tractable, scalable even though somewhat abstract • Scale model to other domains (UAV control, Urban Search and Rescue) and environments (DDD, NeoCities) • Force model generalization across environments • Explore fidelity/tractability tradeoffs
Issue 1: Scaling Properties • Cognitive science is usually concerned with absolute performance (e.g. latency) at fixed complexity points • Often less discriminative than scaling properties • Study human performance at multiple complexity points to understand scaling and robustness issues • Scaling provides strong constraints on algorithms and representations • Robustness is a key issue in extrapolating individual performance to multi-agent interaction and overall network performance, reliability and fault-tolerance • Quantify impact on all measures of performance • Converging measures of performance provide stronger evidence than separate measures susceptible to parametric manipulation • Understanding of scaling key to enabling abstraction
Constraints and Analyses • AMBR illustrated strong cognitive constraints put on the scaling of performance as a function of task complexity • Past analyses have shown the impact of: • Architectural component interactions (Wray et al, 2007) • Representational choices (Lebiere & Wallach, 2001) • Parameter settings on dynamic processes (Lebiere, 1998)
Scaling Experiments • Study human performance at multiple complexity points to understand scaling and robustness issues • Vary task complexity (e.g. level of aircraft autonomy) • Vary problem complexity (e.g. number of aircraft) • Vary information complexity (e.g. aircraft characteristics) • Vary network topology (e.g. number of controllers) • Vary rate of change of environment (e.g. appearance or disappearance of aircraft, weather, network topology) • Quantify impact on all measures of performance • Direct performance (number of targets handled, etc) • Situation awareness (levels, memory-based measures) • Workload (both self-reporting and physiological measures)
Issue 2: Dynamic Interaction • Main problem in developing high-fidelity cognitive models of multi-agent interaction are the increased degrees of freedom of open-ended agent interaction • Methodology has been developed to model multi-agent interactions in games and logistics (supply chain) problems (West & Lebiere, 2001, Martin et al, 2004) • Develop baseline model to capture first-order dynamics • Replace most HITL with baseline model(s) to reduce DOF • Refine model based on greater data accuracy and revalidate • Methodology can be extended to multiple levels of our hierarchy, each time abstracting to next level • Also extends to heterogeneous simulations with mixed levels including HITL, models and agents
Results: Model against Model • Performance resembles a random walk with widely varying outcomes • Distribution of streaks hints at fractal properties • The model with the larger lag will always win in the long run
Results: Model against Human • Performance of human against lag1 model is similar to lag2 model • Lag2 model takes time to get started because of longer chunks whereas lag1 model starts faster because it uses fewer shorter chunks
Results: Effects of Noise • Performance improves sharply with noise, then gradually decreases • Noise fundamentally alters the dynamic interaction between players • Noise is essential to adaptation in changing real-world environments
Interactive Alignment • Tendency of interacting agents to align communicative means at different levels(Pickering & Garrod 2004) • Task success is correlated with alignment(Reitter & Moore 2007) • More alignment if interlocutors are perceived to be non-human (Branigan et al. 2003)
Micro-Evolution • Communities will evolve communicative standards • e.g., Reference to Landmarks, identification strategies for locations(e.g., Garrod & Doherty 1994, Fay et al. in press) Garrod & Doherty 1994 : location identification strategy: counting boxes vs. connections
Micro-Evolution • Evolutionary dynamics apply • How do cognitive agents enable and influence evolution? (Pressure? Heat?)
Autonomous agents • Can autonomous agents support alignment and communicative evolution? • Interaction of humanoid cognitive models with autonomous agents • as a testbed before testing with humans. • How can communicative behavior of UAVs be adapted to take limitations of human cognition into account?
Interaction Experiments • Impact of evolving, interactive communication • Vary constraints on evolution of communication (e.g. fixed vs. adaptive communication channel) • Vary constraints on sharing of communication (e.g. pair-wise vs. community communication development) • Impact of fixed, flexible or emergent network organization • Vary network flexibility (e.g. communication beyond grid) • Vary level of information sharing (e.g. information filters) • Accurate cognitive models for human-machine interaction • Adaptive interfaces (e.g. to predicted model workload) • Model-based autonomy (eg. handle monitoring, routine decision-making)
Issue 3: Behavior Abstraction • First two issues build solutions toward this one • Study of scaling properties helps capture response function for all aspects of target behavior • Abstraction methodology helps iterate and test models at various levels of abstraction to maximize retention • Issues: • Grain scale of components (generic tasks, unit tasks?) • Attainable degree of fidelity at each level? • Capture individual differences or average, normative behavior? • Latter may miss key interaction aspects outliers • Individual differences as architectural parameters (WM, speed) • Use cognitive model to generate data to train machine learning agent tailored to individual decision makers
ACT-R vs. Neural Network Model Answer Lag 1 Lag 2 Neural network model based on same principles (West, 1998;1999) • Simple 2-layer neural network • Localist representation • Linear output units • Fixed lag of 1 or 2 • Dynamics arise from the interaction of the two networks • Network structure (fields) can be mapped to chunk structure (slots) • ACT-R and network both store game instances (move sequences) • ACT-R and network are similarly sensitive to game statistics • Noise plays a more deliberate role in ACT-R than neural network
Individual vs Group Models • Model of sequence expectation applied to baseball batting • Key representation and procedures general, not domain-specific • Cognitive architecture constrains performance to reproduce all main effects: recency, length of sequence and sequence ordering • Variation in performance between subjects can be captured using straightforward parameterization of perceptual-motor skills
Markov Model (Gray, 2001) • 2 states: expecting fast or slow pitch • Probabilities of switching state as, af and temporal errors when expecting fast and slow pitch Tf, Ts need to be estimated • 2 more transition rules and associated parameters (ak, ab) to handle pitch count Basic Markov assumption: Current state determines future
Markov vs. ACT-R • State representation • Markov has discrete states that represent decisions • ACT-R has graded states that reflect the state of memory • Transition probabilities • Markov needs to estimate state transition probabilities • ACT-R predicts state change based on theory of memory • Pitch count • Markov has to adopt additional rules and parameters • ACT-R generalizes using established representation • ACT-R is more constrained than Markov model • Similar results for backgammon domain: • Comparable results to NN and TD-learning with orders of magnitude fewer training instances
Abstraction Experiments • Impact of Representation Fidelity • Vary degree of model fidelity to determine impact on network dynamics (e.g. high- vs. low-fidelity nodes for specialists vs. generalists) • Determine which model aspects are critical to performance • Impact of Skill Compositionality • Enforce skill composition through standard, common interface and determine impact on performance • Evaluate impact of architectural constructs including working memory support for multi-tasking • Relevant computer science concepts • Abstract Behavior Types • Generalization of abstract data types to temporal streams • Aspect-Oriented Programming • Generalization to allow more complex procedural interaction