Cognitively-inspired Agents as Teammates and Decision Aids

Cognitively-inspired Agents as Teammates and Decision Aids John Yen,1 Laura Strater,2 Michael McNeese,1 Xiaocong Fan,1 Haydee Cuevas,2 Sooyoung Oh,1 Hyun-Woo Kim,1 Dev Minotra,1 and Timothy Hanratty3 1 The Pennsylvania State University2 SA Technologies 3 US Army Research Lab.

Challenges of Human Automation Interaction in Asymmetric Warfare • Complex multi-dimensional (social, cultural) decisions are pushed to lower echelon units. • Deliver relevant information to warfighters at the right time without overloading them. • Effective collaboration between decision aids and war fighters with a suitable level of trust. • Be aware of the underlying automation assumptions and how it represents global situation and its dynamics.

Approach • Use RPD-enabled agents (R-CAST) as decision aids for proactively anticipating and seeking relevant information. • Design a synthetic C2 task involving multiple dimensions: The Three-Block Challenge. • Study factors that affect human-agent team collaboration and trust through a series of experiments.

The Living Laboratory Framework(McNeese, 2006) THEORY KNOWLEDGE ELICITATION SCALED WORLDS models Problem Based Approach use RECONFIGURABLE PROTOTYPES ETHNOGRAPHIC STUDY PRACTICE

Living Lab Framework(McNeese M.D., 2002) RPD-Enabled Decision Aids Scaled Worlds: Three Block Challenges Reconfigurable Prototypes: Human-Agent Collaboration and Trust Experiments Human MacrocognitiveModel

R-CAST: RPD-enabled Agents R-CAST RPD RPD-based Decision Making start Knowledge Base Experience Base Situation analysis Investigation Feature matching RPD-based Decision Making unfamiliar Experience Adaption Decision & Adaptation familiar Situation Recognition Expectancy monitor Evaluate COA Expectancy Monitoring Teamwork Manager unworkable workable anomalies detected Implement COA Communication Taskwork Manager Learning end Experiences Evaluation Criteria Plan Knowledge Anticipate Information Requirements Deliberated decisions: What to do? How to evaluate options? How to implement it? RPD Decision Model Decisions Recommender Option Process manager Execute/Monitor What cues are needed? Who needs it? What expectancies are monitored? Knowledge base New/missing information Information manager Information Requirements Communication manager Conversations Seek Relevant Information Relate high-level info needs To lower-level information How to seek/share information? How to communicate? Inference Rules Investigation Strategies Directory & protocol

The Three-Block Challenge • Simulation environment for studying decision-making decision making in asymmetric warfare. • Make decisions based on intelligence reports. • Three kinds of tasks : Peacekeeping, humanitarian, and combat missions in close proximity to one another.

C2 Simulation Environment

Four Human-centric Experiments • Experiment 1: Supporting Multiple-Context Decision Making • Experiment 2: Trust on Cognitive Aids • Experiment 3: Agent Error Patterns and Human Trust Calibration • Experiment 4: Visualization of Agents Decision Space(VADS) of RPD Agents

Experiment 1: Supporting Multi-Context Decision Making • Participants had to make decisions in multiple contexts. • Context switching frequency was varied in the experiments. • C2 Performance in decision making was improved with RPD-Enabled Agents. • Performance improved most under high context switching frequency.

Experiment 2: Human Trust • The effect of “knowledge of agent reliability” on automation usage decisions was studied. • A systematic-error was introduced into the agents decision. • Participants were divided into groups – one group knew the source of error, the other did not. Both knew about the agents reliability. • Experiments revealed that participants with knowledge of agents source of error had : + Better SA associated with automation error + Better Automation Usage Decisions (AUDs) + Better Trust

Experiment 2 Results • With knowledge about the factors that affect agent reliability, subjects showed • More suitable automation usage decisions. • More suitable level of trust on the agents No KM KM

Experiment 3: Agent Error Patterns and Human Trust Calibration • The previous experiment consisted of ‘systematic errors’ in which participants can make a sense of the source of such errors. • This study consisted of two conditions: • agents with random-errors and • agents with systematic-errors. • Participants in the systematic-error condition made better automation usage decisions.

Experiment 3 Results The number of correct Recommendations changed correctly The number of correct recommendations accepted Systematic errors vs. Random errors Systematic Error Random Error Systematic Error Random Error

A Cognitive Model about Human-Agent Trust (Experiment 2 and 3) • For understandable error patterns, knowledge manipulation on the cause of automation error does improve automation usage decisions. • Human trusts the agent less when it is hard to recognize the pattern of errors.

Experiment 4: Visualization of Agent Decision Space (VADS) • Can it enhance war fighter’s situation awareness? • Can it assist war fighters to project change of threats?

R-CAST Visualization of Agent Decision Space (VADS)

Experiment 4 Design • 32 (16x2) participants recruited for this experiment. • The display condition was a between subjects factor. • One group used the VADS, and the other used an Agent-Decision Table (ADT).

Quick Glance • Differences in scores in scenarios involving high-workload (3 and 4) is more significant than the ones involving low-workload (1 and 2). • Difference in scores for scenario-3 is notably significant (p = .024), especially for crowd control. • The reason behind the possible effectiveness on scenario-3 has been provided in the discussion section.

Experiment Design Distribution of Event Threads Running in Parallel : Means of the type appeared (Fast-Burners, Slow-Burners, Non-Events) for each scenario :

Experiment Data Analysis • Performance Data Analysis • SAGAT Data Analysis

Tasking Analysis The average number of targets per a trial EXP Group CON Group • TA : Target Appeared • DM : Decision Made • TC : Target Cleared

Score Analysis Histogram of Scores EXP CON

Statistical Analysis of Scores • Different scenarios had different conditions and workload in terms of the distribution of event-types. • The score on each scenario was treated as a separate dependent-variable. • Scenario scores were decomposed to IED-target, Key Insurgent target, and crowd-target for detailed analysis. • A MANOVA test was conducted for all scenario-scores as DV’s followed by univariate-tests on individual scenario-scores as well in addition to crowd-only scenario scores.

Analyzing Individual Scenario-Scores • Scenarios 3 and 4 show more significant differences as opposed to scenarios 1 and 2. • Scenario 3 shows a notably significant difference with p < .05.

Scenario-Score Box-Plots For Each Scenario Variables or Crowd-Specific Scores For: SCEN1or Scenario 1 SCEN2 or Scenario 2 SCEN3 or Scenario3 SCEN4 or Scenario 4 Within Subjects Factors: EXP_CON 1 :Experimental 2 :Control

Univariate-tests : Individual Crowd-Scores per Scenario The experimental group scored significantly higher on the crowd-component of Scenario-3; p<.01.

Performance Data Analysis • Experiment Group (using VADS) resulted in statistically better scores (p=.024) on one out of two high-workload scenarios (3 and 4), and was most effective on crowd control (p=.009). • Possible Explanation: VADS was useful to help to predict (level 3 SA), especially when it is hard (e.g., even distribution of fast burner and slow burner under high workload) • Further study is needed to verify this.

Experiment Data Analysis • Performance Data Analysis • SAGAT Data Analysis

Chi-Square Analysis on SAGAT Data • A Chi-Square analysis was done on all SAGAT queries to determine if groups differed in correct responses. • We assessed whether the proportion associated with the categories “wrong” and “right” was significantly different from the hypothesized proportions of .5. • In other words, we assessed if the experimental and control group responses were better than “chance” (if they were to get 50% right and 50% wrong) • This method allowed us to conclude indirectly if the Experimental group (VADS) performed better than the Control group (TADS); by showing one group did better than "chance" while the other did not. • We hypothesized that participants in the VADS condition would perform better on crowd-related queries as compared to those in the TADS condition.

Chi-Square Results: Table • The hypothesis that the VADS would perform better on crowd-related queries as compared to those in the TADS condition was supported by the Chi-Square results of query 4. • The VADS group performed better on Queries 9 & 10. • The TADS group preformed better on Query 15. • The Chi-Square analyses for all other queries were reported non significant

Chi-Square Results: Graphs Query 4: Was the last crowd target you assigned units to more likely to become an insurgency event or sectarian event? Query 9: How many police units were untasked when the scenario ended? Query 10: How many combat units were untasked when the scenario ended? Query 15: What was your final score?

SAGAT Analysis Conclusion • Participants using the VADS display performed better on Situational Awareness (SA) queries than participants using the TADS display. • We assessed this indirectly, by showing one group preformed better than "chance" while the other did not. • VADS display supports Level 1 SA information pertaining to unit queries more than the TADS display (Query 9 & 10). • VADS display supports Level 2 and 3 SA information pertaining to crowds queries more than the TADS display ( Query 4).

Human-agent collaboration via RPD RPD as a new component of Shared Mental Model RPD Decision Situation Awareness RPD Decision Progress Rule-based knowledge analysis for recognition diagnosis

Editor for timed-process

Scientific Contributions Generate better understanding about human agent interaction regarding trust and AUD. Preliminary investigation on how automation transparency (through visualization) improves AUD. Developed cognitively-inspired agents as teammates and decision aids. 40

FUSION PRODUCTS • Findings • Supporting Evidence DCGS-A Integrated Backbone (DIB) Commander LEVEL 2 FUSION Relationship Discovery Service • Organization • Aggregation • Correlation Analyst Query Social Networking Analysis Information Exploitation Enablers USERS DATA SOURCES DCGS-A USER SERVICES • Data Mining • Analysis • Visualization Army Transitions: Information Exploitation Army Fusion Initiatives Soft Target Exploitation and Fusion ATO Advanced All Source Fusion ATO Tactical Human Integration of Networked Knowledge (THINK) Army Technology Objective • Developing agent-assisted information mediation services to enhance Army intelligence Improving distributed collaboration and human-agent decision-making in complex network-enabled operations

Future Directions: Bridging RPD Agents with NeoCities [McNeese et al, 2004] • A test-bed for teams and decision-making in emergency-crisis response. • Has been utilized in the past to study effects of: • Task-load (complexity, pace, severity). • Cognitive-Aids • Context

Future Directions • Incorporate trial-to-criteria using metrics developed by Robert Hoffman (IHMC). • Further study regarding the impact of VADS on AUD and on maintaining global SA. • Complete comparison with Walter Warwick’s computational RPD model in Micro Saint (Alion) • Publications • Neuro-ergonomic studies regarding human-agent interactions, trust, visualization of agent decision space, and SA.

Acknowledgements • SA Tech • Dan Colombo • Corey Pearson • Mica Endsley • Alion • Walter Warwick • ARL • Laurel Allender • IHMC • Robert Hoffman • Penn State • Kristinka Ivannova

Publications • X. Fan, S. Sun, B. Sun, G. Airy, M. McNeese, J. Yen. Collaborative RPD-Enabled Agents Assisting the Three-Block Challenge in Command and Control in Complex and Urban Terrain.In Proc. of Behavior Representation in Modeling and Simulation (BRIMS), 2005. • X. Fan, S. Sun, M. McNeese, and J. Yen. Extending Recognition-Primed Decision Model For Human-Agent Collaboration. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS’05), 2005. • X. Fan, B. Sun, S. Sun, M. McNeese, J. Yen, R. Jones, T. Hanratty, and L. Allender. RPD-Enabled Agents Teaming with Humans for Multi-Context Decision Making. In Proc. of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’06), 2006. • X. Fan and J. Yen. R-CAST: Integrating Team Intelligence for Human-Centered Teamwork. In Proc. of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007. • X. Fan, S. Oh, M. McNeese, J. Yen, H. Cuevas, L. Strater, and M. Endsley. The Influence of Agent Reliability on Trust in Human-Agent Collaboration. In Proc. of the European Conference on Cognitive Ergonomics (ECCE'08), 2008. • L. Starter, M. McNeese, H. Cuevas, X. Fan, S. Oh, and J. Yen. Making Sense of Error Patterns: Toward Appropriate Human-Agent Trust. Cognitive Science, In review. • 1 US Patent Application on Collaborative RPD for Human Agent Collaboration

Graduates • Rashaad Jones, PhD in IST, SA Tech • Shuang Sun, PhD in IST • Cong Chen, PhD in IST, IBM • Rui Wang, PhD in IST • Kaivan Kamali, PhD in CSE, Bloomberg • Bingjun Sun, PhD in CSE

Thank you

Experiment 4 - Demographics

Experiment 4 - Score Analysis Boxplot of Scores EXP CON Do we have to remove outliers?

Experiment 4 - Score Analysis Still learning???

Experiment 4 - Score Analysis Boxplot of Scores by the order of trials EXP Group CON Group

Experiment 4 - Score Analysis Boxplot of Scores by the scenarios EXP Group CON Group

Real-Time SA (RT-SA) Analysis Boxplot of Scores EXP CON

RT-SA Analysis No learning???

Cognitively-inspired Agents as Teammates and Decision Aids