Pennsylvania State University John Yen Chen Zhong Peng Liu

Revealing the Characteristics of Cyber Analysts’ Reasoning Processes: A Trace Analysis ApproachAnnual ReviewARO MURI on Computer-aided Human-centric Cyber SAOctober 29, 2013 Pennsylvania State University John Yen Chen Zhong Peng Liu Army Research Laboratory Robert Erbacher Steve Hutchinson Renee Etoty Hasan Cam William Glodek

Computer-Aided Human Centric Cyber Situation Awareness J. Yen, C. Zhong, P. Liu, R. Erbacher, S. Hutchinson, R. Etoty, H. Cam, W. Glodek • Objectives: • Understand the analytical reasoning process of cyber analysts • Capture the analytical reasoning trace of cyber analyst through non-invasive tool • Develop a model of analytical reasoning process that can capture rich trace and enable automated trace analysis • Conduct experiments involving cyber analysts • Scientific/Technical Approach • Developed Observation-Hypothesis-ActionHypothesis(OHA) model of analytical reasoning process • Developed and implemented Analytical Reasoning Support Tool for Cyber Analysis (ARSCA) • Designed experiments that capture realistic challenges in cyber SA using VAST 2012. • Collaborated with an ARL study about visualization of cyber SA led by Dr. Erbacher. • Conducted multiple pilot studies (at Penn State and Army Research Lab) to polish ARSCA • Accomplishments • Conducted experiments, in collaboration with Army Research Lab, involving subjects from Penn State and ARL. • Initial case study about trace analysis provided new insights about the reasoning process of analysts • Initial correlation analysis suggest relationship between characteristics of traces and performance/expertise • Opportunities • Improve performance of analysts through OHA-based training • Investigate the difference strategies between experts and novice • Investigate using aggregated analyst experiences to support analytical reasoning process.

Cognitive Models & Decision Aids • Instance Based Learning Models • Simulation • Measures of SA & Shared SA • Software • Sensors, probes • Hyper Sentry • Cruiser • Information Aggregation & Fusion • Transaction Graph methods • Damage assessment • Automated • Reasoning • Tools • R-CAST • Plan-based narratives • Graphical models • Uncertainty analysis Data Conditioning Association & Correlation Multi-Sensory Human Computer Interaction Computer network • Enterprise Model • Activity Logs • IDS reports • Vulnerabilities Real World Computer network System Analysts Test-bed

Year 4 Accomplishments at a Glance Publications: Zhong, C., Kirubakaran, D.S., Yen, J., Liu, P., Hutchinson, S., & Cam, H., “How to Use Experience in Cyber Analysis: An Analytical Reasoning Support System”, in Proceedings of IEEE Conference on Intelligence and Security Informatics (ISI), 2013. Chen, P.C., Liu, P., Yen, J., & Mullen, T., “Experience-based cyber situation recognition using relaxable logic patterns”, in IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA), pp. 243-250, 2012. Chen Zhong, VAST 2013 Workshop Presenter Working papers for CogSIMA 2014 • Technology transfer: • J. Yen as summer faculty • fellow at ARL • Deep collaborations with • ARL researchers: • Brought the ARSCA toolkit to Adelphi site • 12 ARL security analysts participated • Weekly teleconferences • Joint work on a series of papers • Invention Disclosure to PSU • Awards: • Best Paper Award, CogSIMA 2012. • Chen Zhong: Grace Hopper Celebration of Women in Computing Scholarship. • Chen Zhong, Honorable Mention, VAST Challenge 2013, Mini-Challenge 3 (Visual Analytic for Cyber SA) • Students: • Chen Zhong, PhD • Tools: • ARSCA

Cyber SA Depends on Human Analysts Attacks Depicted Situation Network Compare Data Sources (feeds) Ground Truth (estimates) Job Performance

“Hi Bob, how did you nail it?” Answer A: “you know this is my job” Answer B: “this tool is awesome” Answer C: “I talked to Jacob” Answer D: “I employ good reasoning” [our research focus]

High level research questions Q1. How do analysts reason? Q2. Does good reasoning matter? Q3. If it matters, how to enable analysts to do more good reasoning, and less bad reasoning? [training?] Q4. How to automate, to which extent? • Understanding the analyst’s reasoning processes is essential to bridge the gaps between human and tools. • The analyst’s reasoning processes provide insights on how to automate…

Prerequisites P1. Need to get the reasoning processes of analysts P2. Need to characterize these reasoning processes P3. Need to correlate the characteristics with job performance • In AI, this is related to “knowledge acquisition/solicitation” • In Cognitive Science, this is denoted “theories on how we reason”, e.g., the mental model theory, the procedural memory concept in ACT-R

Existing Knowledge Acquisition Approaches • CTA: cognitive task analysis • Simulation: ACT-R needs “procedural memory” • Knowledge engineering inexpert systems • Case-based learning The Knowledge Acquisition Bottleneck (Feigenbaum)

Our Approach Insight 1: Diverse reasoning processes may share common structures and critical elements • We propose: OHA model Insight 2: These critical elements and the relationships among them later on could be used to recover the reasoning processes Insight 3: Using a software tool to track the traces of analysts’ reasoning processes • We built ARSCA (Analytical Reasoning Support Tool for Cyber Analysis) toolkit

Three Merits • Don’t need the analyst to remember what he/she did; can automatically restore his/her reasoning processes from traces. • Directly correlated the traces with job performance • Provided abundant details; thoughts expressed in natural language

Challenges C1. Validation challenge: Are the restored reasoning processes really the original? C2. How to trace in a non-intruding, non-distracting manner? C3. Tradeoff challenge: tradeoffs Automation in trace analysis: How structured? How much info can be collected?

Task Design Data from VAST 2012 Challenge • Data sources • Corporate network configuration • Firewall logs 26, 000, 000 entries. • IDS alerts 35,000 entries. • Ground truth • An attack over two days (40 hours)

Task Design (2)

Tracing Tool Architecture Invisible tracking Queries View View - Keystrokes - Data filtering conditions - Observations DBMS Engine A tree of thoughts View Answers IDS alerts Firewall logs Others XML traces Mouse Keyboard

How to work with the Tool? • Demo1: working with the tool. • Demo2: traces are captured in XML files.

Let’s Look into the Traces • One trace (“pilot 1”) • A quick replay of the analytical reasoning process • A quick look of the trace • Look into the Hypotheses • Look into the Actions and Observations (which forms the “context” of the hypotheses) • Compare 10 Traces • Initial Correlation

Tour: First Step • One trace (“pilot 1”) • A quick replay of the analytical reasoning process • A quick look of the trace • Look into the Hypotheses • Look into the Actions and Observations (which forms the “context” of the hypotheses) • Compare 10 Traces • Initial Correlation

A Quick Replay of the Analytical Reasoning Process Video

Tour: Step 2 • One trace (“pilot 1”) • A quick replay of the analytical reasoning process • A quick look of the trace • Look into the Hypotheses • Look into the Actions and Observations (which forms the “context” of the hypotheses) • Compare 10 Traces • Initial Correlation

A Quick Look of the Trace Duration: 36 min # of Nodes: 30 Width: 8 E-Tree: Depth: 3 Trace Operations # of Operations: 92 …

H-Tree EU (Experience Unit) Action 1 Observation 1 Looking into IDS alerts Continuous occurrence of alerts showing an outside ip connecting to various inner ips Hypothesis 1 The outside ip is the malicious C&C server H1 … EU (Experience Unit) Action 2 Observation 2 Looking into Firewall Log, check network flow from this suspicious ip All the destination ports are different. H2 H-Tree Hypothesis 2 This outside ip may do a port scan. E-Tree

Operations on Hypotheses H_New: Create a hypothesis H_Sbling: Add a sibling/alternative hypothesis H1 … H_Jump: Change the current focus from one hypothesis to another. H_Edit_Content: Edit the content of a hypothesis H2 H_Edit_Truth: Edit the truth value of a hypothesis H-Tree

Look into the Hypotheses of Pilot1 (Cont’d) • # of the hypothesis: 21 • Operations on the hypotheses (next slide)

Look into the Hypotheses of Pilot1

The Context of a Hypothesis Observation 1 Action 1 Looking into IDS alerts Continuous occurrence of alerts showing an outside ip connecting to various inner ips Context 1 Hypothesis 1 The outside ip is the malicious C&C server Action 2 Observation 2 Looking into Firewall Log, check network flow from this suspicious ip All the destination ports are different. Hypothesis 2 This outside ip may do a port scan.

The Context of a Hypothesis Observation 1 Action 1 Looking into IDS alerts Continuous occurrence of alerts showing an outside ip connecting to various inner ips Context 2 Hypothesis 1 The outside ip is the malicious C&C server Action 2 Observation 2 Looking into Firewall Log, check network flow from this suspicious ip All the destination ports are different. Hypothesis 2 This outside ip may do a port scan.

Actions and Observations in E-Tree (Cont’d) Observation 1 Action 1 Looking into IDS alerts Continuous occurrence of alerts showing an outside ip connecting to various inner ips Hypothesis 1 The outside ip is the malicious C&C server Action 2 Observation 2 Looking into Firewall Log, check network flow from this suspicious ip All the destination ports are different. Hypothesis 2 This outside ip may do a port scan.

Actions and Observations Action Observation Checking IDS Alerts Finding … in IDS Alerts Checking Network Topology Finding … in Network Topology Checking Firewall logs Finding … in Firewall logs … …

Operations on Actions and Observations

Look into the E-Tree in Pilot1’s E-Tree:# of Nodes

Look into the E-Tree in Pilot1’s Trace:# of Operations

Look into the E-Tree of Pilot1 (Cont’d)

Summary: A Slower Replay Video

Two Cases of Jumping Back • Go back to previous node • Case 1 JUMP_FROM_TO ( H39431008 H46131157 ) ADD_SIBLING ( H46131157 H66431551 ) • Case 2 JUMP_FROM_TO ( H89931527 H58331044 ) CHANGE_TRUTH_VALUE ( H58331044 False, Unknown ) 3 2 1 2 1

Trace Comparison: # of Nodes in E-Tree More alternative hypotheses

Trace Comparison: E-Trees

Trace Comparison:Trace Operations

Initial Correlation • Correlated performance with E-Tree features:

FY 2014 Plan • Continue to conduct, in collaboration with ARL researchers, Analytical Reasoning Experiment (VAST 2012) • Analyze the traces of analytical reasoning • Is the first thought important for an analyst’s performance? • How will the key observation influence the analytical reasoning process? • What are the differences between strategies used by experts and novice? • Design and conduct, in collaboration with ARL researchers, a collaborative analytical reasoning experiment • Enables digging into flow data • Two-analysts teams • Leverages VAST 2013 • Enhance the context-guided experience-based analytical reasoning support • Aggregating multiple experiences of analysts • Support context-guided experience-based simulation

Q & A Thank you.

Pennsylvania State University John Yen Chen Zhong Peng Liu